i 

GenCore version 4.5 
Copyright (c) 1993 - 2000 Compugen Ltd. 



OM nucleic - nucleic search, using sw model 



Run on: February 7, 2002, 10:51:11 ; Search time 3842.15 Seconds 

(without alignments) 
1807.663 Million cell updates/sec 

Title: US-09-394 -7 4 5-5950 

Perfect score: 421 

Sequence: 1 gggtccaggcacgcgtccga agtggcagaatttgtgccgc 421 



Scoring table: IDENTITY_NUC 

Gapop 10.0 , Gapext 1.0 

Searched: 1472140 seqs, 8248589755 residues 



Total number of hits satisfying chosen parameters: 2944280 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 

Database : GenEmbl : * 



1 




gb__ba : * 




2 




gb htg: 


★ 


3 




gb in : * 




4 




gb om: * 




5 




gb ov : * 




6 




gb pat : 


* 


7 




gb_ph : * 




8 




gb pi : * 




9 




gb_pr : * 




10 


gb_ro : 


* 


11 


gb_sts 


: * 


12 


gb_sy : 


* 


13 


gb_un : 


★ 


14 


gb_vi : 




15 


em ba : 




16 


em fun 


: * 


17 


em hum 




18 


em in : 




19 


em om: 




20 


em or : 




21 


em ov: 




22 


em_jpat 


: * 


23 


: em_ph : 




24 


: ernjpl : 




25 


: em_ro: 




26 


: em_sts 


: * 


27 


: em_sy: 





'i 



28 


em 


un : * 


2 9 


em 


VI . 




em 


K "h rr o H 1 1 m * * 
U Ly U 11 Lilt l • 


31 


em 


htgo inv:* 


32 


em 


htgo rod:* 


33 


em_ 


htg_hum: * 


34 


em_ 


jitg_inv: * 


35 


em_ 


htg__rod: * 


36 


em 


htg_other : * 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SUMMARIES 

% 

Result Query 





No . 


Score 


Match 


Length 


DB 


ID 


Description 




1 


87. 


6 


20. 


8 


1996 


A 

O 


AFo ol oOU 


AF361600 


Arabidops 




2 


78. 


8 


18. 


7 


104769 


A 

8 


7v m i~i A 1 /I 

ATF9G14 


AL162973 


Arabidops 




3 


78. 


8 


18 . 


7 


1 a i r a *d 

107 603 


A 

8 


ACUloool 


AC016661 


Arabidops 


c 


4 


37. 


2 


8. 


8 


17 9681 


9 


TV T A /" C A A C 

AL365225 


AL365225 


Human DNA 




5 


36. 


6 


8. 


7 


7 4 881 


2 


AC 0 2 0 o 2 0 


AC020320 


Drosophil 


c 


6 


36. 


6 


8. 


7 


114 396 


9 


nfi i n no ro 

HS109oFo 


AL034547 


Human DNA 


c 


7 


36. 


6 


8. 


7 


*i i a o r a 

170869 


a 

3 


7\ /A A 1 T £T A. £T 

AC011 o9o 


AC011696 


Drosophil 


c 


8 


36. 


6 


8. 


7 


171831 


3 


AC007 47 3 


AC007473 


Drosophil 


c 


9 


36. 


6 


8. 


7 


183215 


2 


AC0233 / 0 


AC023370 


Homo sapi 


c 


10 


36. 


6 


8 . 


7 


278196 


3 


"A T~" A A D O A [1 

AE00 Jo2o 


AE003825 


Drosophil 




11 


36. 


4 


8 . 


6 


3911 


5 


ATI n A A A C 

SAU02 9 id 


U02975 Squalus aca 


c 


12 


36. 


2 


8. 


6 


94555 


9 


AL589684 


AL589684 


Human DNA 


c 


13 


36. 


2 


8. 


6 


168438 


2 


AC 0 2 4 6 3 2 


AC024632 


Homo sapi 




14 


36 


8. 


6 


824 


8 


71 pi ATA/ 1- /" 

AF1372 66 


AF137266 


Nuphar lu 




15 


35. 


6 


8. 


5 


1 A 1 A C 1 

1012 61 


a 
I 


7\/~»ai ncoo 
AC UlUbdO 


AC010628 


Homo sapi 




16 


35. 


6 


8. 


5 


155185 


2 


AC023549 


AC023549 


Homo sapi 


c 


17 


35. 


6 


8. 


5 


156886 


2 


AC023445 


AC023445 


Homo sapi 




18 


35. 


6 


8. 


5 


168353 


2 


AL390024 


AL390024 


Homo sapi 


c 


19 


35. 


4 


8. 


4 


103640 


8 


AP003278 


AP003278 


Oryza sat 


c 


20 


35. 


4 


8. 


4 


154966 


2 


AP001387 


AP001387 


Homo sapi 


c 


21 


35. 


4 


8. 


4 


155874 


2 


AC090415 


AC090415 


Homo sapi 




22 


35. 


4 


8. 


4 


166350 


2 


AP003330 


AP003330 


Oryza sat 


c 


23 


35. 


4 


8. 


4 


167348 


2 


AC027780 


AC027780 


Homo sapi 


c 


24 


35. 


4 


8. 


4 


172693 


2 


AC074246 


AC074246 


Homo sapi 




25 


35. 


4 


8. 


4 


176697 


2 


AC021170 


AC021170 


Homo sapi 


c 


26 


35. 


4 


8. 


4 


192391 


9 


AC010768 


AC010768 


Homo sapi 


c 


27 


34. 


8 


8. 


3 


1539 


8 


HTL2NFR 


Z26251 H 


. tuberosus 


c 


28 


34. 


8 


8 . 


3 


95533 


2 


AC093224 


AC093224 


Homo sapi 


c 


29 


34. 


8 


8. 


3 


162098 


2 


AC019032 


AC019032 


Homo sapi 


c 


30 


34. 


8 


8 . 


3 


162671 


2 


AC034299 


AC034299 


Homo sapi 




31 


34. 


8 


8. 


3 


173334 


2 


AC091198 


AC091198 


Homo sapi 




32 


34. 


8 


8. 


3 


173585 


2 


AC021113 


AC021113 


Homo sapi 


c 


33 


34. 


8 


8. 


3 


216078 


2 


AC087053 


AC087053 


Homo sapi 


c 


34 


34 . 


6 


8. 


2 


145722 


2 


AC015503 


AC015503 


Homo sapi 


c 


35 


34 . 


6 


8 . 


2 


162125 


2 


AC073494 


AC073494 


Homo sapi 


c 


36 


34. 


6 


8. 


2 


165394 


8 


AC025296 


AC025296 


Oryza sat 


c 


37 


34. 


6 


8. 


2 


166019 


2 


AC015648 


AC015648 


Homo sapi 




38 


34. 


6 


8. 


2 


203203 


9 


AC020910 


AC020910 


Homo sapi 





jy 


.34 , 


. 4 


8 , 


. 2 


1 / bl 


*3 *3 
JJ 


AL-Uz o y / 




40 


34 , 


.4 


8. 


.2 


176179 


2 


AC092591 




41 


34 , 


.2 


8. 


. 1 


1452 


1 


UEU81722 


c 


42 


34 , 


.2 


8. 


.1 


1753 


9 


AK023499 


c 


43 


34 , 


.2 


8. 


.1 


138322 


2 


AC012241 


c 


44 


34 , 


.2 


8. 


.1 


146072 


2 


AL360233 




45 


34 , 


.2 


8, 


.1 


161992 


2 


AC079750 



Ac026978 Homo sapi 
AC092591 Homo sapi 
U81722 Unidentifie 
AK023499 Homo sapi 
AC012241 Homo sapi 
AL360233 Homo sapi 
AC079750 Homo sapi 



ALIGNMENTS 



RESULT 1 

AF361600 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
REFERENCE 
AUTHORS 



TITLE 
JOURNAL 



COMMENT 



AF361600 1996 bp DNA PLN '23-MAY-2001 

Arabidopsis thaliana AT5g0297 0/F9G14_280 gene, complete cds . 
AF361600 

AF361600.1 GI:13605548 
FLI_CDNA. 
thale cress. 
Arabidopsis thaliana 

Eukaryota; Viridiplantae; Streptophyta ; Embryophyta; Tracheophyta; 
Spermatophyta; Magnoliophyta ; eudicotyledons ; core eudicots; 
Rosidae; eurosids II; Brassicales; Brassicaceae ; Arabidopsis. 

1 (bases 1 to 1996) 

Shinn,P., Chen,H., Cheuk,R., Kim,C.J., Banh,J., Bowser, L., 
Carninci,P., Chung, M.K., Goldsmith, A. D . , Hayashizaki , Y . , Ishida,J. 
Jones, T., Kamiya,A., Karlin-Neumann, G . , Kawai,J., Lam,B., Lee,J.M. 
Lin, J., Liu,S.X., Miranda, M., Narusaka,M., Nguyen, M., Palm, C. J., 
Pham,P.K., Quach, H . L . , Sakano,H., Sakurai,T., Satou,M., Seki,M., 
Southwick, A. , Toriumi,M., Yamada,K., Yu,G., Shinozaki , K . , 
Davis,R.W., Theologis,A. and Ecker,J.R. 
Arabidopsis cDNA clones 
Unpublished 

2 (bases 1 to 1996) 

Shinn,P., Chen,H., Cheuk,R., Kim, C.J. 
Carninci, P. , Chung, M.K. , Goldsmith, A. D. 
Jones, T . , Kamiya, A. , Karlin-Neumann, G . , 
Lin, J., Liu,S.X,, Miranda, M., Narusaka,M., Nguyen, M., Palm, C. J., 
Pham,P.K., Quach, H.L., Sakano,H., Sakurai,T., Satou,M., Seki,M., 
Southwick, A. , Toriumi,M., Yamada,K., Yu,G., Shinozaki , K . , 
Davis,R.W., Theologis,A. and Ecker,J.R. 
Direct Submission 

Submitted ( 15-MAR-2001 ) Salk Institute Genomic Analysis Laboratory 
(SIGnAL) , Plant Biology Laboratory, The Salk Institute for 
Biological Studies, 10010 N. Torrey Pines Road, La Jolla, CA 92037 
USA 

RIKEN Genomic Sciences Center (GSC) members carried out the 
collection and clustering of RAFL cDNAs (RAFL cDNA : 1 RIKEN 
Arabidopsis Full-Length cDNA 1 ) : Seki,M., Narusaka,M., Ishida,J., 
Satou,M., Kamiya, A., Sakurai,T., Carninci, P., Kawai,J., 
Hayashizaki, Y . and Shinozaki, K. 

The Salk, Stanford, PGEC (SSP) Consortium members carried out the 
sequencing and annotation of the RAFL cDNAs : Shinn,P., Chen,H., 
Cheuk,R., Kim, C.J. , Koesema,E., Meyers, M.C., Tracy, S.E., Banh,J. 
Bowser, L., Chung, M.K., Goldsmith, A . D . , Jones, T., Karlin-Neumann, G . 



Banh, J. , Bowser, L. , 
. Hayashizaki, Y. , Ishida,J. 
Kawai,J., Lam,B., Lee,J.M. 



Lam,B., Lee,J.M., Lin, J., Liu,S.X., Miranda, M., Nguyen, M . , 
Palm, C. J., Pham,P.K., Quach, H . L . , Sakano,H., Southwick, A . , 
Tang,C.C, Toriumi,M., Yamada,K., Yu,G., Davis, R.W., Theologis , A. , 
and Ecker,J.R. 



FEATURES 

source 



5'UTR 
CDS 



3 ! UTR 
BASE COUNT 
ORIGIN 



Shinn,P. (SSP/Salk) and Seki,M. (RIKEN GSC) contributed equally to 
this work. Shinozaki,K. (RIKEN GSC) and Ecker,J.R. (SSP/Salk) 
contributed equally to this work as Pis. 

Location/Qualifiers 

1. .1996 

/organism="Arabidopsis thaliana" 

/db_xref-"taxon:3702" 

/ ch r omo s ome = " 5 " 

/clone= r, RAFL09-10-O19 (R12091) " 
/note="ecotype : Columbia" 
1. .225 
226. .1770 

/note="unknown protein" 
/codon_start=l 

/product="AT5g02 97 0/F9G14_280" 
/protein_id="AAK327 68 .1" 
/db_xref="GI : 13605549" 

/translation="MEEIRGVPTWQEELASLVDAGLRYDGAPIDLTAATKRSGFVSAD 
GSGSEPKETLKDQVTGFMKSWGEMLLELAKGCKDIVQQTVVTDDSFLVRKLRKPAAKV 
SKKLSFLNEFLPEDRDPIHAWPVIFFVFLLALAALSFSPENDRPVTVITKLRLHPTGA 
TRVQLPDGRYIAYQELGVSAERARYSLVMPHSFLSSRLAGIPGVKKSLLVEYGVRLVS 
YDLPGFGESDPHRGRNLSSSASDMINLAAAIGIDEKFWLLGYSTGSIHTWAGMKYFPE 
KIAGAAMVAPVINPYEPSMVKEEVVKTWEQWLTKRKFMYFLARRFPILLPFFYRRSFL 
SGNLDQLDQWMALSLGEKDKLLIKDPTFQEVYQRNVEESVRQGITKPFVEEAVLQVSN 
WGFTLSEFRTQKKCATNGVLSWLMSMYSEAECELIGFRKPIHIWQGMEDRVAPPSMSD 
YISRMIPEATVHKIRNEGHFSFFYFCDECHRQIFYALFGEPKGQLERVKETEDTVVET 
EAHKDT" 
1771. .1996 
601 a 369 c 483 g 543 t 



Query Match 20.8%; Score 87.6; DB 8; Length 1996; 

Best Local Similarity 60.4%; Pred. No. 2.6e-16; 

Matches 180; Conservative 0; Mismatches 114; Indels 4; Gaps 2; 

Qy 118 aggacagaactttactggaacgtcctgtgttcaatgcattctgggaaaggaatgttgcag 17 7 

II I I I I I I II I II III III I I I I I I I II I I I M I I 
Db 1280 AGGATAAACTTTTAATCAAAGATCCAACGTTTCAAGAAGTTTATCAAAGGAACGTGGAGG 1339 

Qy 17 8 agtctgtgcagccaaggagatgcaaggccatttgtggacgaagctgtgctgcaagtatct 237 

I I 1 I I I I I I I I I I II I I I M I I I I I I II I I I I I I I I I I I I I II I 
Db 1340 AATCAGT-CCGTCAAGGAATCACAAAACCATTTGTAGAAGAAGCCGTGCTTCAAGTATCG 1398 

Qy 238 gactggggtttcagcctatctgacatccaactgcagaagaaa gaggctcaaggcttt 294 

MINIMI I I I I I I III I I I I I I I I I I I III I 

Db 1399 AATTGGGGCTTTACTCTTTCGGAATTCCGCACACAGAAGAAATGTGCAACCAACGGTGTC 14 58 

Qy 295 tttgaactcatcacgtctctgttcaatcatgctgaaaaacagtgggtgggatttctgggc 354 

II I I I I I I I I I I I I I I I I I I I I II I I I I I 

Db 14 59 CTTTCTTGGCTCATGTCAATGTACAGTGAAGCCGAATGTGAACTAATCGGATTTCGAAAA 1518 



Qy 355 ccaatacatatatcgcaggggatagatgaccgagtgatctcgccctcagtggcagaat 412 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III 
Db 1519 CCCATTCACATATGGCAGGGAATGGAGGATCGAGTGGCTCCACCATCAATGAGTGACT 157 6 



RESULT 2 

ATF9G14 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



JOURNAL 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



FEATURES 

source 



PLN 
BAC clone 



03-APR-2000 
F9G14 (ESSA 



misc feature 



gene 



gene 



CDS 



ATF9G14 104769 bp DNA 
Arabidopsis thaliana DNA chromosome 5, 
project) . 
AL162973 

AL162973.1 GI:7413544 



thale cress. 
Arabidopsis thaliana 

Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; 
Spermatophyta; Magnoliophyta; eudicotyledons ; core eudicots; 
Rosidae; eurosids II; Brassicales; Brassicaceae ; Arabidopsis. 

1 (bases 1 to 104769) 

Bevan,M., Terryn,N., Ardiles,W., Buysshaert , C . , Dasseville, R. , De 
Clerck,R., De Keyser,A., Neyt,P., Rouze,P., Van Den Daele,H., 
Villaroel, R. , Gielen,J., Van Montagu, M . , Bancroft, I., Mewes,H.W., 
Rudd,S., Lemcke,K. and Mayer f K . F . X . 
Unpublished 

2 (bases 1 to 104769) 

EU Arabidopsis sequencing, project . 
Direct Submission 

Submitted ( 03-APR-2000) MIPS, at the Max-Planck-Institut fuer 
Biochemie, Am Klopferspitz 18a, D-82152 Martinsried, FRG, E-mail: 
lemcke@mips . biochem.mpg . de, mayer@mips . biochem.mpg . de Project 
Coordinator: Mike Bevan, Molecular Genetics Department, Cambridge 
Laboratory, John Innes Centre, Colney Lane, NR4' 7UJ Norwich, UK, 
E-mail : michael . bevan @bbsrc .ac.uk 

Information on performance of analysis and a more detailed 
annotation of this entry and other sequences of chromosomes 3, 4 
and 5 can be viewed at: http://www.mips.biochem.mpg.de/proj/thal/. 
Location/Qualifiers 
1. .104769 

/or ganism=" Arabidopsis thaliana" 
/variety=" Columbia" 
/db_xref="taxon: 3702" 
/chromos ome = " 5 " 
1. .10970 

/note="overlap with BAC T22P11" 
complement (join (4 857 . . 5159, 5958 . 
/gene="F9G14_10" 
4857.' .7155 
/gene="F9G14_10" 
complement (join (4 857 , 
/gene="F9G14_10 n 

/note="similarity to various predicted proteins, 
Arabidopsis thaliana" 
/codon__start=:l 
/product="putatiye protein." 
/protein_id="CAB86024 .1" 
/db_xref="GI : 7413545" 

/translation="MAKRRERRGRRQHRSHRRIQRIIDGADFINYMPDDILHHILSFI 



6113,6244. .7155) ) 



,5159,5958. .6113,6244. .7155)) 



exon 

intron 
exon 

intron 
exon 

exon 

gene 
CDS 



intron 



exon 



intron 



exon 



intron 



exon 



exon 



.8541,8629. .8716,8860. .8956) 



PTDLAMRTSVLSRRWRHVWCETPCLDITLKHGAMNQTLTSYTAPIITSFKLVMDLNSN 
TVPQVDSWIEFALSRNVQNLSVFVRDFTYSKTYRFPDIFYLSSSLKLLDVTLDFFDMI 
PTCTVSWKSLRNLTLRFCQIPDESIHNILSGCPILESLTLDTCRLLERLDLSKSPNLR 
RLDINQQYRRTGPVAIVAPHIYYLRLTYSSTPSTIVDVSSLSEANLTIISSLLSPLTA 
DGYQTMALEMLSKFHNVKRLTVGETLLQILSLAELRGVPFPTLKVQTLTVKTEFVRSV 
IPGISRLLQNSPGLKKLRPSTMKMHHLKGLYPDQCWRSTCEVFPTSKEI YKMLGCNDA 
TLKLVASFMDLVLRNAKTLERMVVWLGGIYFNGDAPWFEEELFDMVETLSRNNNVSIL 
LKQSNC" 

complement (4857 . . 5159) 

/gene= n F9G14_10" 

/number=l 

complement (5160 . .5957) 
/number=l 

complement (5958 . . 6113) 

/gene="F9G14_10" 

/number=2 

complement (6114. ,6243) 
/number=2 

complement (624 4 . .7155) 
/gene="F9G14_10" 
/number=3 
8104. .8370 
/gene="F9G14_20" 
/number=l 
8104. .8956 
/gene="F9G14_20" 
join(8104. .8370,8463. 
/gene="F9G14_20" 
/codon_start=l 
/product="putative protein" 
/prot ein_id="CAB8 6025.1" 
/db_xref="GI: 741354 6" 

/trans lation="MDMSLALSTAPMSRTIISATRRSQVSQPKAKKVKPANKRPTMST 

SGFSGGTTKELTWKCVEGCGACCKIAKDFSFATPDEIFDNPDDVELYRSMIGDDGWCL 

NYDKATRKCSI YADRPYFCRVEPEVFKSLYGIEEKKFNKEAVSCCIDTIKTI YGPDSK 

ELDSFNRAIRSNPSSS" 

8371. .8462 

/gene="F9G14_20" 

/number=l 

8463. .8541 

/gene="F9G14_20" 

/number =2 

8542. .8628 

/gene="F9G14_20" 

/number=2 

8629. .8716 

/gene="F9G14_20" 

/number=3 

8717. .8859 

/gene="F9G14_20" 

/number=3 

8860. .8956 

/gene="F9G14__20" 

/number=4 

complement (9258 . . 9481) 

/gene="F9G14_30" 

/number=l 



gene 
gene 
CDS 



intron 
exon 

tRNA 
CDS 



exon 

gene 
gene 
exon 



CDS 
12967, 



,9481,9597. .9789) ) 
predicted protein, Arabidopsis 



complement {join (9258. .9481,9597. .9789) ) 
/gene="F9Gl4_30" 
9258. .9789 
/gene="F9G14_30" 
complement (join (9258 , 
/gene="F9G14_30" 
/note="similarity to 
thaliana" 
/codon_start=l 
/product="putative protein" 
/protein_id="CAB86026. 1" 
/db_xref="GI: 7413547" 

/translation="MTGIGGEKCKRCGIYEQGSLVSDKEFDWEVCPTDFSASQVYMH 

FKEKE I NAT FVCHGCAKFHSAVAAS S PQEEG YNGLT FMIAI I AGVLCT TLVVVGGVFM 

FKHTQRMKKQRDQARFMQLFEESDEPEDELGLDPVI" 

complement (9482 . . 9596) 

/number=l 

complement (9597 . . 9789) 
/gene="F9G14_30" 
/number=2 
10120. .10201 

/note="tRNA predict as a tRNA- Ser : anticodon cga" 

11086. .11703 

/gene="F9G14_40" 

/ no te="similarity to pathogenisis-related protein 1.2, 

Triticum aestivum, EMBL : AJ00734 9) " 

/codon_start=l 

/product="pathogenesis related protein-like" 
/protein_id="CAB86027 .1" 
/db_xref="GI : 74 13548" 

/translat ion="MSSSSLYHPFCLS I SSVLLLLLLIFSGEFPSTAGTSSPDTKAAA 

ARATNRGRRNKQSAEFLLAHNAARVASGASNLRWDQGLARFASKWAKQRKSDCKMTHS 

GGPYGENIFRYQRSENWSPRRVVDKWMDESLNYDRVANTCKSGAMCGHYTQIVWRTTT 

AVGCARSKCDNNRGFLVICEYSPSGNYEGESPFDIPKITLKLDKI " 

11086. .11703 

/gene-"F9G14_40" 

/number=l 

11086. .11703 

/gene="F9G14_40" 

11993. .13743 

/gene="F9G14_50" 

11993. .12170 

/gene-"F9G14_50" 

/number =1 

join (11993. . 12170 , 124 66 . . 12 607 , 12 653 . .12806, 12 920. 

13093. .13193,13611. .13743) 
/gene="F9G14_50" 
/codon_start=l 
/product="putative protein" 
/protein_id="CAB86028 . 1" 
/db_xref="GI: 7413549" 

/translat ion="MSITRSAIRSSLCLRKLDPEISRSSLPFLQEWRKCLSTATEQPP 
PASPLPPPPGGSPGEGRFYGKFSGFSKHALKTDIMNVLEGCSLTSDDLKFNYPRGGNL 
TPAAVEEFESKLMNFYPFSFSTMLVYISFVQFPSLSAYDKALRNIAKKGKLYRLEKAA 
RAQWDPIVPYEGKVVALHGIPVNAITDDIDRFLSGCLYYPGSIQFLTVQGLGTSKRVA 
LVRFTSQTQAMNAYITKNRNFLLNQRITLQVLQ" 



intron 

exon 

intron 

exon 

intron 

exon 

intron 

exon 

intron 

exon 

exon 

gene 
CDS 



12171.. 
/gene= 
/numbe 
12466. 
/ gene= 
/numbe 
12608. 
/gene= 
/numbe 
12653. 
/gene= 
/numbe 
12807. 
/gene= 
/numbe 
12920. 
/ gene= 
/numbe 
12968. 
/gene= 
/numbe 
13093. 
/gene= 
/numbe 
13194. 
/gene= 
/numbe 
13611. 
/gene= 
/numbe 
15604 . 
/gene= 
/numbe 
15604. 
/gene= 
15604. 
/gene= 



.12465 
"F9G14_50" 
r=l 

.12607 
"F9G14_50" 
r=2 

.12652 
"F9G14_50" 
r=2 

.12806 
"F9G14_50" 
r=3 

.12919 
"F9G14_50" 
r=3 

. 12967 
"F9G14_50" 
r=4 

.13092 
"F9G14_50" 
r=4 

.13193 
"F9G14_50" 
r=5 

.13610 
"F9G14_50" 
r=5 

.13743 
"F9G14_50" 
r=6 

.16428 
"F9G14_60" 
r=l 

.16428 
"F9G14_60" 

.16428 
"F9G14 60" 



Query Match 18.7%; 
Best Local Similarity 60.7%; 
Matches 164; Conservative 



Score 7 8.8; DB 8; 
Pred. No. 2.1e-13; 
0; Mismatches 102; 



Length 104769; 
Indels 4; Gaps 



2; 



Qy 110 cattattcaggacagaactttactggaacgtcctgtgttcaatgcattctgggaaaggaa 169 

I I I I I I I I I I I I I I I I I II III III I I I I I I I I I I II 
Db 92312 CATCTTTCAGGATAAACTTTTAATCAAAGATCCAACGTTTCAAGAAGTTTATCAAAGGAA 92371 

Qy 170 tgttgcagagtctgtgcagccaaggagatgcaaggccatttgtggacgaagctgtgctgc 22 9 

III I I I I I I I I I I I I I I III I I I I I I I I I I I I I I I I I I I I I 
Db 92372 CGTGGAGGAATCAGT-CCGTCAAGGAATCACAAAACCATTTGTAGAAGAAGCCGTGCTTC 92430 

Qy 230 aagtatctgactggggtttcagcctatctgacatccaactgcagaagaaa gaggctc 286 

I I I I I I I I I I I I I I I I I I I I I I III I I I I I I I I I I I 

Db 924 31 AAGTATCGAATTGGGGCTTTACTCTTTCGGAATTCCGCACACAGAAGAAATGTGCAACCA 924 90 



Qy 287 aaggcttttttgaactcatcacgtctctgttcaatcatgctgaaaaacagtgggtgggat 34 6 

III I II I I I I II I I I I I I I I I I I I I I I I II 

Db 924 91 ACGGTGTCCTTTCTTGGCTCATGTCAATGTACAGTGAAGCCGAATGTGAACTAATCGGAT 92550 



Qy 347 ttctgggcccaatacatatatcgcagggga 376 

III I I II I I I I I I I I I II I I 

Db 92551 TTCGAAAACCCATTCACATATGGCAGGTGA 92580 



RESULT 3 

AC016661 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 



Creasy, T.H. , Haas, B.J. , 
Fujii,C.Y. , Utterback, T.R. , 



AC016661 107603 bp DNA PLN 16-MAR-2001 

Arabidopsis thaliana chromosome 3 BAC F11F8 genomic sequence, 
complete sequence. 
AC016661 

AC016661.7 GI:12484383 
HTG. 

thale cress. 
Arabidopsis thaliana 

Eukaryota ; Viridiplantae ; Embryophyta ; Tracheophyta ; Spermatophyt a ; 
Magnoliophyta; eudicotyledons ; core eudicots; Rosidae; eurosids II; 
Brassicales; Brassicaceae; Arabidopsis . 

1 (bases 1 to 107603) 

Lin,X., Kaul,S., Town,C.D., Benito, M., 
Wu,D., Maiti,R., Ronning, C . M . , Koo,H., 
Barnstead,M.E. , Bowman, C.L., White, O., Nierman,W.C. and Fraser,C.M, 
Arabidopsis thaliana chromosome 3 BAC F11F8 genomic sequence 
Unpublished 

2 (bases 1 to 107603) 
Lin,X. and Kaul,S. 
Direct Submission 

Submitted (04-DEC-1999) The Institute for Genomic Research, 
Medical Center Dr, Rockville, MD 20850, USA, xlin@tigr.org 

3 (bases 1 to 107603) 
Lin,X. 

Direct Submission 

Submitted (25- JAN-2001 ) The Institute for Genomic Research, 
Medical Center Dr., Rockville, MD 20850, USA 

4 (bases 1 to 107603) 
Town, CD. and Kaul,S. 
Direct Submission 

Submitted ( 1 6-MAR-2001 ) The Institute for Genomic Research, 
Medical Center Dr, Rockville, MD 20850, USA, cdtown@tigr.org 
On Jan 25, 2001 this sequence version replaced gi: 122807 4 8 . 
Address all correspondence to : at@tigr . org 



9712 



9712 



9712 



BAC clone F11F8 is 
The orientation of 
clone . 

Genes were identif 
prediction program 
http://CCR-081.mit 
http : / /genemark. bi 
of GlimmerM, see M 
http: //www. tigr .or 
GeneSplicer (Mihae 
mpertea@tigr.org) , 
peptide database a 
(http : //www .tigr . o 
indicate the level 



from Arabidopsis thaliana chromosome 3 
the sequence is from SP6 to T7 end of the BAC 

ied by a combination of several methods: Gene 
s including Genscan+ (Chris Burge, 
.edu/GENSCAN.html) , GeneMarkHMM (Mark Borodovsky, 
ology.gatech.edu/GeneMark/), GlimmerA (a variant 
ihaela Pertea, 

g/sof tlab/glimmerm_htm/glimmerm. html , and 
la Pertea and Steven Salzberg, contact 

searches of the complete sequence against a 
nd the plant EST database at TIGR 
rg/tdb/tgi . shtml ) . Annotated genes are named to 
of evidence for their annotation. Genes with 



similarity to other proteins are named after the database hits. 
Genes without significant peptide similarity but with EST 
similarity are named as unknown proteins. Genes without protein 
or EST similarity, that are predicted by more than two gene 
prediction programs over most of their length are annotated as 
hypothetical proteins. Genes encoding tRNAs are predicted by 
tRNAscan-SE (Sean Eddy, http://genome.wustl.edu/eddy/tRNAscan-SE/) . 
Simple repeats are identified by repeatmasker (Arian Smit, 
http://ftp.genome.washington.edu/RM/RepeatMasker.html) . 
FEATURES Location/Qualifiers 
source 1 . . 107603 

/organism="Arabidopsis t ha liana" 
/cultivar="Columbia" 
/db_xref="taxon:3702" 
/ c h r omo s ome= " 3 " 
/map="m532" 
/clone="FHF8" 
misc_feature 1. .8479 

/note="overlap with BAC clone F3L24 
(AC011436: 98210. .106688) ." 
mRNA complement (join (<1631. .3543,3793. .>4006)) 

/gene="FHF8. 1" 
gene complement ( 1631 . .4006) 

/gene="FllF8.1" 

/note="identical to GB:CAA76606" 
CDS complement (join (1808 . .3543,3793. .4006)) 

/gene= ,, FHF8.1" 
/codon_start=l 

/product="heat shock cognate 70kD protein" 
/protein_id="AAF2327 6.1" 
/db_xref="GI: 6682224" 

/translation="MAGKGEGPAIGIDLGTTYSCVGVWQHDRVEIIANDQGNRTTPSY 
VAFTDSERLIGDAAKNQVAMNPINTVFDAKRLIGRRFTDSSVQSDIKLWPFTLKSGPA 
EKPMIVVNYKGEDKEFSAEEISSMILIKMREIAEAYLGTTIKNAVVTVPAYFNDSQRQ 
ATKDAGVIAGLNVMRIINEPTAAAIAYGLDKKATSVGEKNVLIFDLGGGTFDVSLLTI 
EEGIFEVKATAGDTHLGGEDFDNRMVNHFVQEFKRKNKKDISGNPRALRRLRTACERA 
KRTLSSTAQTTIEIDSLFDGIDFYAPITRARFEELNIDLFRKCMEPVEKCLRDAKMDK 
NSIDDVVLVGGSTRIPKVQQLLVDFFNGKELCKSINPDEAVAYGAAVQAAILSGEGNE 
KVQDLLLLDVTPLSLGLETAGGVMTVLIQRNTTIPTKKEQVFSTYSDNQPGVLIQVYE 
GERARTKDNNLLGKFELSGIPPAPRGVPQITVCFDIDANGILNVSAEDKTTGQKNKIT 
ITNDKGRLSKDEIEKMVQEAEKYKSEDEEHKKKVDAKNALENYAYNMRNTIRDEKIGE 
KLAGDDKKKIEDSIEAAIEWLEANQLAECDEFEDKMKELESICNPIIAKMYQGGEAGG 
PAAGGMDEDVPPSAGGAGPKIEEVD" 

mRNA join(<5922. .6455,6539. .6706,6840. .>8465) 

/gene="FllF8_2 " 

gene 5922. .8465 

/gene-"FHF8__2 " 

/note="predicted by genscan , similar to hypothetical 
protein GB:AAF14039" 
CDS join(5922. .6455,6539. .6706,6840. .8465) 

/gene="FHF8_2" 
/ codon_start=l 

/product=" hypothetical protein" 
/protein_id="AAF23277 .1" 
/db_xref="GI : 6682225" 

/trans la tion="MRDTTWLERLGLALRTAMACLIVSLTTLYGPKPLRHFTTFPAFS 
YLTTILIWLSDAEPTYGEVLKCCLDVSYAT FQTIAIALVSVLVVGPASLGNGLVAPVA 



VALASFIVAFPVSTSLLTKRIAFGQIVWYVTFVVFNGEVAHVFMLPVHVAGSTALGA 

IASLIAVLLPFPRLAHSQMSKGCKLYAENALERLNMFVEIMMARDNTTAQVLIARAAS 

LSAAAKNTLKNIKIHHERISWERPDTRFLSRKQKLDPAEKLHATDFLLRGLELALGSC 

SSFPQGMSRDELTRLLEGPRTHIAPRSESTLKSQDSLGWHHEAESLSTAALPVCFFRY 

CVELFRGDFLSLRQDSKSVNGRTTEEEIHPANEGLSMARKFWDILCVWMARERFVFAF 

KCSISLGLAVLFGILYNKNNGYWSGLTVAISLVSGRQATLTVANSRLQGTAMGSVYGL 

ICCSVFQRLEEFRFLPLLPWIILAVFMRHSKVYGQPGGVTAAIAALLILGRRNYGAPT 

EFAIARIVEASIGLLCFVFGEILVT PARAATLARTEISHCLDALLDCIQSLVLCSEQK 

NQKVVADLRKSQVKLKSHVEALERFAAEALTEPKIPFLRRLNTDSYNRLLGSFSKISD 

LCLYVCDGLKNLSGVQPTLAFPWDNITHELRAFQEKLHPSVKCLKEISQTKSQARLQK 

ELQKRKICHDVEAGTTSNDNYSYMELGPSQADVERFSVSFVMLLKEATDKISCNTADD 

AFKSETALCLSSLGFCISRLMQETICIMTEITHTT" 

complement (<9001. .>9108) 

/gene="FHF8_3" 

complement (9001 . . 9108) 

/gene="FHF8_3" 

complement (9001 . .9108) 

/ g ene="FllF8_3" 

/codon_start=l 

/product="unknown protein" 

/protein_id="AAF23278 . 1 " 

/db_xref="GI: 6682226" 

/ trans lation="MFDDQDLGFFANFLGIFIFILVIAYHFVMADPKFE" 

complement (join (<10934 . . 11033, 11335. . 11585, 1194 9 . .120 62, 

12157. .12220,12325. .12482,12614. .>12784)) 

/gene="FllF8_4" 

complement (10934 . . 127 84 ) 

/gene="FHF8_4" 

/note="predicted by genefinder" 
'complement (join (10934. .11033, 11335. .11585, 1194 9. .12062, 
12157. .12220,12325. .12482,12614. .12784)) 
/gene="FllF8_4" 
/codon_start=l 

/product=" hypothetical protein" 
/protein_id="AAF23279. 1" 
/db_xref="GI: 6682227" 

/translation="MESRNDEEAPLISASGEDRKVRAGKCYTRDVHILSISFLLIFLA 
YGAAQNLETTVNKDLGTISLGILYVSFMFCSMVASLVVRLMGSKNALVLGTTGYWLFV 
AANLKPSWFTMVPASLYLGFAASIIWVGQGTYLTSIARSHATDHGLHEGSVIGVFNGE 
FWAMFACHQEGSTSGTTLLMLYFLFSMTLGTILMFFIRKIDGEDGKGPVGSPVGLVDS 
LASLPRMIITPLLDIRMLLIVPLLAYSGLQQAFVWAEFTKEIVT PAIGVSGVGGAMAV 
YGALDAVVS" 

complement (<13264 . . >1364 4 ) 
/gene="FHF8_5" 
complement (132 64 . .1364 4) 
/gene="FllF8_5" 

/note="similar to histone H2B 3 GB:CAA12231 from 

(Lycopersicon esculentum) " 

complement (132 64 . .1364 4) 

/gene="FllF8_5" 

/codon_start=l 

/product="putative histone H2B" 
/protein_id="AAF23280. 1" 
/db_xref="GI : 6682228" 

/trans lation="MAPKAEKKPSEKAPKADKKITKEGGSERKKKTKKSTETYKIYLF 
KVLKQVHPDIGISGKAMGIMNSFINDTFEKIALESSRLARYNKKPTITSREIQTAVRL 
VLPGELAKHAVSEGTKAVTKFTSS" 



repeat_region 

mRNA 

gene 

CDS 



complement (13925. .13962) 
/ r p t_f ami 1 y= " { CAGAG ) n " 
<14010. .>15014 
/gene="FllF8_6" 
14010. .15014 
/gene="FllF8_6" 
/note="predicted by genscan" 
14010. .15014 
/gene="FllF8_6" 
/codon_start=l 

/product="hypothetical protein" 
/protein_id="AAF23281. 1" 
/db_xref ="GI : 6682229" 

/translation="MASPATFQFSRTKLNLQINLVERKSPSKLSFPLLFRDQAKSTPI 
RFPVIRASSSSSASNHKPSLLKTTFISLTAAAALFSASFYFVNKRAAMTPVAVVETTL 
EKHLETQSNDVNALSLLTEIKFKSDKHEQAIVFLDRLIEIEPYERKWPAMKARILSYH 
GKSESAIEAFEEILEKDPIRVDAYHYLVMEYYNSKPKLTEIEKRINKVIRRCKKEKKT 
KEILGFRMLIAQIRFIEGKSVEAIRICEELVKEDPNDFTIYLFQGVVYTLMNKGEEAA 
KQFEHVARVIPRNHPSRETAARTTNSNEWRVIVAYDNVYCYLSTFARLSMASLFNKFG 



mRNA 



gene 



CDS 
16269) 



repeat_region 



join(<15421. .15424,15520. .15655,15920. .16076, 

16195. .>16269) 

/gene-"FHF8_7" 

15421. .16269 

/gene="FHF8_7 " 

/note="similar to 60S ribosomal protein L35 GB:AAC27830" 
join (15421. . 15424 , 15520 . .15655, 15920. .1607 6, 16195. 

/gene="FHF8_7 " 
/codon_start=l 

/product="putative 60S ribosomal protein L35" 
/protein_id="AAF23282 .1" 
/db_xref="GI: 6682230" 

/translation="MARIKVHELREKSKSDLQNQLKELKAELALLRVAKVTGGAPNKL 

SKIKVVRKSIAQVLTVSSQKQKSALREAYKNKKLLPLDLRPKKTRAIRRRLTKHQASL 

KTEREKKKEMYFPIRKYAIKV" 

complement (18349. .18451) 

/ r pt _f ami 1 y= " AT__r i ch " 



Query Match 18.7%; 
Best Local Similarity 61.5%; 
Matches 161; Conservative 



Score 78.8; DB 8; 
Pred. No. 2.1e-13; 
0; Mismatches 97; 



Length 107603; 
Indels 4; Gaps 



2; 



Qy 115 ttcaggacagaactttactggaacgtcctgtgttcaatgcattctgggaaaggaatgttg 174 

I I I I I I I I I I III II I I I I I I I II I I II I I I I I I I I I 

Db 72356 TTCAGGACAAACTTGTAACCGCAGATCCAGTTTTTGAAGATCTTTACCAAAGGAATGTGG 72415 

Qy 175 cagagtctgtgcagccaaggagatgcaaggccatttgtggacgaagctgtgctgcaagta 234 

I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II III 
Db 72416 AGGAGTCTGTAC-GCCAAGGAACTGCAAAACCATTTGTGGAAGAAGCCGCATTACAGGTA 72474 

Qy 235 tctgactggggtttcagcctatctgacatccaactgcagaagaa agaggctcaaggc 2 91 

II I I I I I I I I I I I I III I I I I II I I I I I I I I I I I I I 

Db 724 75 TCAAATTGGGGCTTTAGTCTTCCCGAGTTCCACATGCAGAAGAAGTGTAGAACCAATGGC 72534 



Qy 292 ttttttgaactcatcacgtctctgttcaatcatgctgaaaaacagtgggtgggatttctg 351 
I I I I II I I I I I I I I I I I I I I I I I I I I I 



Db 72535 GTCCTCTCTTGGCTAATGTCAATGTACAGTGAATCCGAATGTGAACTAATTGGTTTTCGG 72594 



Qy 352 ggcccaatacatatatcgcagg 373 

II I I I I I Mil I I I I I 
Db 72595 AAACCTATACACATATGGCAGG 7 2 616 



RESULT 4 
AL365225/c 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



FEATURES 

source 



AL365225 179681 bp DNA PRI 06-JUN-2001 

Human DNA sequence from clone RP11-17 9A5 on chromosome 1, complete 
sequence . 
AL365225 

AL365225.12 GI:14329978 

HTG. 

human . 

Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 179681) 
Hall, R. 

Direct Submission 

Submitted { 03- JUN-2001 ) Sanger Centre, Hinxton, Cambridgeshire, 
CB10 ISA, UK. E-mail enquiries: humquery@sanger.ac.uk Clone 
requests : doner equest@ Sanger .ac.uk 

On Jun 8, 2001 this sequence version replaced gi:14280436. 
During sequence assembly data is compared from overlapping clones. 
Where differences are found these are annotated as variations 
together with a note of the overlapping clone name. Note that the 
variation annotation may not be found in the sequence submission 
corresponding to the overlapping clone, as we submit sequences with 
only a small overlap as described above. 

This sequence was finished as follows unless otherwise noted: all 
regions were either double-stranded or sequenced with an alternate 
chemistry or covered by high quality data (i.e., phred quality >= 
30) ; an attempt was made to resolve all sequencing problems, such 
as compressions and repeats; all regions were covered by at least 
one plasmid subclone or more than one M13 subclone; and the 
assembly was confirmed by restriction digest. The following 
abbreviations are used to associate primary accession numbers given 
in the feature table with their source databases: Em:, EMBL; Sw:, 
SWISSPROT; Tr:, TREMBL; Wp:, WORMPEP; Information on the WORMPEP 
database can be found at 

http : //www . Sanger . ac . uk/Projects/C_elegans/wormpep This sequence 
was generated from part of bacterial clone contigs of human 
chromosome 1, constructed by the Sanger Centre Chromosome 1 Mapping 
Group. Further information can be found at 
http : //www . Sanger . ac . uk/HGP/Chrl 

RP11-179A5 is from the library RPCI-11.1 constructed by the group 
of Pieter de Jong. For further details see 
http : //www . chori . org/bacpac/home . htm 
VECTOR: pBACe3.6 

This sequence is the entire insert of clone RP11-179A5 The true 
left end of clone RP11-512F24 is at 131246 in this sequence. The 
true right end of clone RP4-658C17 is at 82886 in this sequence. 

Location /Qualifiers 

1. .179681 



/organism="Homo sapiens" 

/db_xref="taxon: 9606" 

/chromosome=" 1 " 

/clone="RPll-179A5" 

/clone_lib="RPCI-ll . 1" 
repeat_region 1378. .1685 

/note="HALl repeat: matches 1021. .1356 of consensus" 
repeat_region 1695. .1828 

/note="LlMD3 repeat: matches 7590. .7738 of consensus 
repeat_region 1836. .2066 

/note="LlMD2 repeat: matches 5889. .6122 of consensus 
repeat_region 2067. .2357 

/note="AluSp repeat: matches 1. .293 of consensus" 
repeat_region 2358. .2453 

/note="LlMD2 repeat: matches 5795. .5890 of consensus 
repeat_region 2534. .2708 

/note="MIR repeat: matches 72. .261 of consensus" 
repeat_region 3700. .3861 

/note="MIR repeat: matches 44. .212 of consensus" 
repeat_region 3864. .4681 

/note="LlMC5 repeat: matches 7056. .7913 of consensus 
repeat_region 4697. .4849 

/note="L2 repeat: matches 1376. .1526 of consensus" 
repeat_region 5364. .5449 

/note="43 copies 2 mer tc 62% conserved" 
repeat_region 6192. .6269 

/note="3 copies 26 mer 80% conserved" 
repeat_region 6208. .6267 

/note="30 copies 2 mer ca 90% conserved" 
repeat_region 8988. .9265 

/note="AluSc repeat: matches 1. .281 of consensus" 
repeat_region 10039. .10232 

/note="LlPA7 repeat: matches 5951. .6143 of consensus 
repeat_region 10509. .10634 

/note="Ll£A7 repeat: matches 6018. .6143 of consensus 
repeat_region 10929. .11087 

/note="MLTlG repeat: matches 31. .179 of consensus" 
repeat_region 11465. .11644 

/note="MER20 repeat: matches 1. .217 of consensus" 
repeat_region 13522. .13877 

/note="MLT2FB repeat: matches 2. .366 of consensus" 
repeat_region 14273. .14667 

/note="MSTD repeat: matches 1. .394 of consensus" 
repeat_region 14884. .15088 

/note="L2 repeat: matches 2495. .2710 of consensus" 
repeat_region 17250. .17341 

/note="23 copies 4 mer tcct 80% conserved" 
repeat_region 17753. .17907 

/note="MIR repeat: matches 17. .192 of consensus" 
repeat_region 18170. .18505 

/note="MER2 repeat: matches 1. .345 of consensus" 
repeat_region 18903. .19195 

/note="LlMC3 repeat: matches 7357. .7735 of consensus 
repeat_region 19838. .20022 

/note="AluY repeat: matches 124. .305 of consensus" 
repeat_region 20309. .20472 

/note="LlM4 repeat: matches 3852. .4017 of consensus" 



repeat_region 20495. .21223 

/note="L2 repeat: matches 1363. .2199 of consensus" 
repeat_region 22711. .22834 

/note="MER5B repeat: matches 54. .176 of consensus" 
repeat_region 22835. .22874 

/note="20 copies 2 mer gt 100% conserved" 
repeat_region 22889. .23170 

/note="AluJb repeat: matches 1. .282 of consensus" 
repeat_region 25201*. .25252 

/note="Alu repeat: matches 251. .302 of consensus" 
repeat_region* 25257. .26124 

/note="LlMC5 repeat: matches 6954. .7777 of consensus" 
repeat_region 26136. .26611 

/note="LlMC/D repeat: matches 5316. .6983 of consensus" 
repeat_region 26852. .28167 

/note="LlMC3 repeat: matches 6435. .7739 of consensus" 
repeat_region 28752. .28861 

/note="LlMC/D repeat: matches 5484. .5583 of consensus" 
repeat_region 29223. .29533 

/note="AluY repeat: matches 1. .311 of consensus" 
repeat_region 29534. .29654 

/note="L2 repeat: matches 2378. .2491 of consensus" 
repeat_region 29687. .29836 

/note="MIR repeat: matches 31. .218 of consensus" 
repeat_region 29883. .30020 

/note-"L2 repeat: matches 2162. .2295 of consensus" 
repeat_region 30211. .30508 

/note="AluSx repeat: matches 1. .298 of consensus" 
repeat_region 31890. .32233 

/note="L2 repeat: matches 1804. .2145 of consensus" 
repeat_region 32234. .32544 

/note="AluSq repeat: matches 1. .311 of consensus" 
repeat_region 32545. .33119 

/note="L2 repeat: matches 2145. .2750 of consensus" 
repeat_region 34189. .34352 

/note="L2 repeat: matches 691. .851 of consensus" 
repeat_region 34606. .34936 

/note="L2 repeat: matches 2130. .2481 of consensus" 
repeat_region 34937. .35158 

/note="MER45C repeat: matches 1. .255 of consensus" 
repeat_region 35273. .35461 

/note="MER45C repeat: matches 741. .952 of consensus" 
repeat_region 35462. .35704 

/note="L2 .repeat: matches 2474. .2750 of consensus" 
repeat_region 36266. .36356 

/note="L2 repeat: matches 2409. .2502 of consensus" 
repeat_region 36501. .36560 

/note="L2 repeat: matches 2646. .2705 of consensus" 
repeat_region 37841. .38131- 

/note="AluSp repeat: matches 1. .297 of consensus" 
repeat_region 38207. .38468 

/note="AluJb repeat: matches 1. .301 of consensus" 
misc_feature complement (38675. . 39177) 

/note="match: STS : Em: HSPE17E05" 
repeat_region 39185. .39486 

/note="AluSg repeat: matches 1. .302 of consensus" 
repeat_region 39632. .40051 



/note="LlPBl repeat: matches 5710. .6146 of consensus" 
40063. .40380 

/note="AluJb repeat: matches 1. .295 of consensus" 
40388. .40527 

/note^"LlMC5 repeat: matches 7654. .7779 of consensus" 
40659. .41529 

/note="L!MD2 repeat: matches 5430. .6329 of consensus" 
41710. .41835 

/note="LlMB8 repeat: matches 5833. .5955 of consensus" 
41836. .42139 

/note="AluY repeat: matches 1. .302 of consensus" 
42140. .42696 

/note="LlMB8 repeat: matches 5258. .5833 of consensus" 
42783. .43143 

/note="L2 repeat: matches 2346. .2710 of consensus" 
45083. .45209 

/note="MIR repeat: matches 71. .207 of consensus" 
45441. .45486 . 

/note="23 copies 2 mer tt 73% conserved" 
45488. .45705 

/note="AluJo repeat: matches 39. .256 of consensus" 
47081. .47215 

/note="MIR repeat: matches 47. .191 of consensus" 
47260. .47496 

/note="MIR repeat: matches 1. .262 of consensus" 
48297. .49263 

/note="L2 repeat: matches 1727. .2701 of consensus" 
49271. .49322 

/note= n L2 repeat: matches 2649. .2700 of consensus" 
52241. .52671 

/note="MLTlH repeat: matches 73. .547 of consensus" 
52971. .53230 

/note="AluJb repeat: matches 10. .289 of consensus" 
53255. .53564 

/note="AluSq repeat: matches 2. .311 of consensus" 
53749. .53986 

/note="MIR repeat: matches 3. .254 of consensus" 
55654. .55770 



repeat_region 
repeat_region 
repeat_region 
repeat_region 
repeat_region 
repeat_region 
repeat_region 
repeat_region 
repeat_region 
repeat_region 
repeat_region 
repeat_region 
repeat_region 
repeat_region 
repeat_region 
repeat_region 
repeat_region 
repeat_region 
repeat_region 

Query Match 8.8%; Score 37.2; DB 9; Length 179681; 

Best Local Similarity 47.4%; Pred. No. 2.1; 

Matches 111; Conservative 0; Mismatches 123; Indels 0; Gaps 



0; 



Qy 10 cacgcgtccgaattgaggttagcttaacaattcttagtagtcaccccttcgattaaatgt 69 

I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I 

Db 137 319 CAGGCAATCAAAATAAAGTGAGTCACACAAATGTTTTTGTTTCCCAGTACATCTAAAAGT 137260 

Qy 7 0 caacatttgccttttcgcgttccaattactaatgttacggcattattcaggacagaactt 12 9 

I II I I I I I I I I I I II I I I I I I I I I I I I I I 

Db 137259 TATGTTTACACTATTCTTTAGTCTATTT^AGTGTGCAATAGCATTATGTATTTAAGAACAA 137200 

Qy 130 tactggaacgtcctgtgttcaatgcattctgggaaaggaatgttgcagagtctgtgcagc 18 9 

II III I I I I I I I I I I I I II I I I I I III I 

Db 137199 TATACATACATTATTTTTTAAATACTTTATTGCTAAAATATGCTAACAAACATCTGAGCC 13714 0 

Qy 190 caaggagatgcaaggccatttgtggacgaagctgtgctgcaagtatctgactgg 243 
I I I I I I I I I I I I I I II I I I I I I 



Db 137139 TTCAGTGAGTCATAATCTTTTTGCTGGTGGAGGGTCTTGCCTTGATGTGGCTGG 137086 



RESULT 5 

AC020320 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 



FEATURES 

source 



BASE COUNT 
ORIGIN 



AC020320 74881 bp DNA HTG 03-JAN-2000 

Drosophila melanogaster , *** SEQUENCING IN PROGRESS in ordered 

pieces . 
AC020320 

AC020320.1 GI:6664577 
HTG; HTGS_PHASE2. 
fruit fly. 

Drosophila melanogaster 

Eukaryota; Metazoa; Arthropoda; Tracheata; Hexapoda; Insecta; 
Pterygota; Neoptera; Endopterygota ; Diptera; Brachycera; 
Muscomorpha ; Ephydroidea; Drosophilidae ; Drosophila . 
1 (bases 1 to 74881) 
Adams, M. and Venter, J. C. 
Direct Submission 

Submitted ( 30-DEC-1999 ) Celera Genomics, 45 West Gude Drive, 
Rockville, MD, USA 

This sequence was identified as CDM: 10212756 by the submitter. 
For more information on this record e-mail to fly@celera.com. 

* NOTE: This is a 'working draft' sequence. 

* This sequence will be replaced 

* by the finished sequence as soon as it is available and 

* the accession number will be preserved. 

Location/Qualifiers 
1. .74881 

/ organism^" Drosophila melanogaster" 
/db_xref="taxon:7227" 
21579 a 16039 c 16088 g 21175 t 



Query Match 8.7%; Score 36.6; DB 2; Length 74881; 

Best Local Similarity 51.5%; Pred. No. 3; 

Matches 84; Conservative 0; Mismatches 79; Indels 0; Gaps 0; 

Qy 14 3 tgtgttcaatgcattctgggaaaggaatgttgcagagtctgtgcagccaaggagatgcaa 202 

I I ! I! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

Db 1247 4 T GT CT T G AAGT C T T C C T GGG AAAT G AAAAT T GAAT AAGAC AT AC AG AC AAAAAT AC AAAA 12533 

Qy 203 ggccatttgtggacgaagctgtgctgcaagtatctgactggggtttcagcctatctgaca 262 

II I I I I I I II I I I I I I I I I I I I I I I II 

Db 12534 TTAATGCTATTCAGGCAGCTGTTTGCATCGATTCCGAATAAAGTTTTTACCAATTTAACT 12593 

Qy 2 63 tccaactgcagaagaaagaggctcaaggcttttttgaactcat 305 

I II I I I I I I I I I I I I I I II If 
Db 12594 T T AAAT G T AAT AAAAAAAT AAC TC AAAT AC TAATAAAGCT TAT 12 636 



RESULT 6 
HS1098F8/C 

LOCUS HS1098F8 114396 bp DNA PRI 15-MAR-2001 

DEFINITION Human DNA sequence from clone RP5-1098F8 on chromosome 

20pll . 23-12 . 3 . Contains an STS and GSSs, complete sequence. 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



FEATURES 

source 



repeat_region 
repeat_region 



AL034547 

AL034547.14 GI:11139873 

HTG. 

human . 

Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 114396) 

Barlow, K. 

Direct Submission 

Submitted ( 08-MAR-2001 ) Sanger Centre, Hinxton, Cambridgeshire, 
CB10 ISA, UK. E-mail enquiries: humquery@sanger.ac.uk Clone 
requests : doner equest@sanger .ac.uk 

On Nov 13, 2000 this sequence version replaced gi: 9795173. 
During sequence assembly data is compared from overlapping clones. 
Where differences are found these are annotated as variations 
together with a note of the overlapping' clone name. Note that the 
variation annotation may not be found in the sequence submission 
corresponding to the overlapping clone, as we submit sequences with 
only a small overlap as described above. 

The following abbreviations are used to associate primary accession 
numbers given in the feature table with their source databases: 
Em:, EMBL; Sw:, SWISSPROT; Tr:, TREMBL; Wp:, WORMPEP; Information 
on the WORMPEP database can be found at 

http: //www. sanger. ac.uk/Projects/C_elegans/wormpep This sequence 
was generated from part of bacterial clone contigs of human 
chromosome 20, constructed by the Sanger Centre Chromosome 20 
Mapping Group. Further information can be found at 
http: //www. sanger.ac.uk/HGP/Chr20 

IMPORTANT: This sequence is not the entire insert of clone 
RP5-1098F8 It may be shorter because we sequence overlapping 
sections only once, except for a 100 base overlap. 
The true right end of clone RP5-1098F8 is at 114396 in this 
sequence. The true right end of clone RP4-742J24 is at 100 in this 
sequence. This sequence was finished as follows unless otherwise 
noted: all regions were either double-stranded or sequenced with an 
alternate chemistry or covered by high quality data (i.e., phred 
quality >~ 30) ; an attempt was made to resolve all sequencing 
problems, such as compressions and repeats; all regions were 
covered by at least one plasmid subclone or more than one M13 
subclone; and the assembly was confirmed by restriction digest. 
RP5-1098F8 is from the library RPCI-5 constructed by the group of 
Pieter de Jong. For further details see 
http : //www. chori . org /bacpac /home . htm 
VECTOR: pCYPAC2 . 

Location/Qualifiers 

1. .114396 

/organism= ,f Homo sapiens" 
/db_xref="taxon: 9606" 
/chr omosome= "20" 
/map="pll.23-12.3" 
/clone="RP5-1098F8" 
/clone_lib="RPCI-5" 
371. .461* 

/note="L2 repeat: matches 2627. .2729 of consensus" 
819. .885 

/note="MIR repeat: matches 79. .146 of consensus" 



repeat_region 2210. .2339 

/note="L2 repeat: matches 2188. .2339 of consensus" 
repeat_region 2340. .2792 

/note="MLTlF repeat: matches 44. .512 of consensus" 
repeat_region 3012. .3148 

/note="L2 repeat: matches 2553. .2656 of consensus" 
repeat_region 3149. .3442 

/note="AluSq repeat: matches 1. .296 of consensus" 
repeat_region 3443. .3527 

/note="L2 repeat: matches 2656. .2749 of consensus" 
misc_feature complement ( 5173 . .5567) 

/note="match: GSS : Em:AQ136848" 
repeat_region 6571. .6635 

/note="MLTlF repeat: matches 14. .83 of consensus" 
repeat_region 6884. .6907 

/note="12 copies 2 mer aa 100% conserved" 
repeat_region 7657. .8052 

/note="MLTlH repeat: matches 2. .426 of consensus" 
repeat_region 8740. .9303 

/note="MLT2D repeat: matches 4. .552 of consensus" 
repeat_region 9530. .9580 

/note="MLTH repeat: matches 96. .138 of consensus" 
repeat_region 9612. .9825 

/note="L2 repeat: matches 1655. .1871 of consensus" 
repeat_region 10324. .10667 

/note="L2 repeat: matches 1128. .1491 of consensus" 
repeat_region 10841. .11012 

. /note="MER5B repeat: matches 1. .178 of consensus" 
repeat_region 11174. .11277 

/note="MER5B repeat: matches 74. .177 of consensus" 
repeat_region 11278. .11322 

/note="MLTlC repeat: matches 1. .45 of consensus" 
repeat_region 11323. .11601 

/note="AluSg repeat: matches 1. .308 of consensus" 
repeat_region 11602. .12030 

/note="MLTlC repeat: matches 45. .466 of consensus" 
repeat_region 12031. .12095 

/note="MER5B repeat: matches 10. .74 of consensus" 
repeat_region 12793. .12881 

/note="MER5A repeat: matches 9. . 94 of consensus" 
repeat_region 12882. .13252 

/note="MLTlB repeat: matches 3. .390 of consensus" 
repeat_region 13253. .13271 

/note="MER5A repeat: matches 94. .112 of consensus" 
repeat_region 13340. .13515 

/note="MER5A repeat: matches 15. .186 of consensus" 
repeat_region 13530. .13661 

/note="MER5A repeat: matches 26. .184 of consensus" 
repeat_region 13813. .15648 

/ note="MER52A repeat: matches 202. .1755 of consensus" 
repeat_region 15673. .15837 

/note="MER5A repeat: matches 1. .189 of consensus" 
repeat_region 16288. .16388 

/note^"MLTlJ repeat: matches 407. .512 of consensus" 
repeat_region 16428. .16716 

/note="AluSx repeat: matches 1. .289 of consensus" 
misc_feature complement (17260 . . 17705) 





/ note— 


IUaLCn. oDD . CjIII.HLJ/D^ZDU 


repeat region 


1 J 1 H o ■ 


. 1 J X Z? 0 




/ not e— 


zo copies z mer ta o;?t> coiioci veu 


repeat region 




1 Q9fl4 




/ not e— 


copies z mer at /u^ conservea 


repeat region 


1 Q9H7 
1 iJZ U / . 


1 Q O O fT 




/note= 


"20 copies 4 mer atat 70% conserved" 


repeat region 


i y o b j . 


. zuui / 




/ not e= 


NhjKoA repeat; matcnes 4. . 1 do or consensus 


repeat region 


ZUUZ4 . 


. ZUZ Jl 




/ not e= 


mlkda repeat: matcnes z. .10/ or consensus 


repeat region 


9 f! 7 £ Q 

z u / or . 


9 1 m p 
. z ± u / 0 




/ not e= 


Aluis j repeat; matcnes 1. . jiu or consensus 


repeat region 




9 1 Q 9 Pi 
. Z 1 0 Z U 




/ note— 


mliil repeat: matcnes 1. . 4oz or consensus 


repeat region 


Ol QQQ 


9 9 9 r / 
. z Z Z D H 




/ not e= 


ijlKibL, repeat: matcnes 410. .010 or consensus 


repeat region 


ZZ yZD . 


9^011 




/ not e= 


LiixirJD repeat,: luatcnes ou / y . .DiDfi or consensus 


repeat region 


zoUlz . 


. zo J4z 




/ not e= 


AluJo repeat: matcnes 1. .oiz or consensus 


repeat region 


O O Q / "2 

z J J4 J . 


. ZJODl 




/note— 


Llf*!bD repeat: matcnes dodu. . ou / y or consensus 


repeat region 


Z4UUU . 


9 a n Q9 

. Z 4 U i7Z 




/ note— 


riiijKby repeat: matcnes i. .iuo or consensus 


misc feature 




9 £ n i 7 

. Z D U 1 / 




/note= 


matcn: bob: tm: d^hv^ i 


misc feature 


25592. 


.26084 




/note= 


"match: GSS : Em:AQ055943" 


repeat region 


26402. 


.26641 




/note= 


"MLT1H repeat: matches 1. .266 of consensus". 


repeat region 


26642. 


.26946 




/note= 


"L2 repeat: matches 2453. .2750 of consensus" 


misc feature 


complement (28404 . . 28755) 




/note= 


"match: GSS: Em:AQ170827" 


misc feature 


28774 . 


.29229 




/note= 


"match: GSS: Em:AQ424970" 


repeat region 


29256. 


.29368 




/note= 


"LTR16C repeat: matches 260. .372 of consensus" 


repeat region 


29804. 


.29847 




/note= 


"11 copies 4 mer acac 84% conserved" 


repeat region 


29812. 


.29847 




/note= 


"18 copies 2 mer ac 91% conserved" 


repeat region 


30731. 


.30794 




/note= 


"16 copies 4 mer ctat 95% conserved" 


repeat region 


31199. 


.31720 




/note= 


"L1ME1 repeat: matches 5549. .6099 of consensus 


repeat region 


32580. 


.32762 




/note= 


"MER5A repeat: matches 1. .189 of consensus" 


misc feature 


complement (33816. .34227) 




/note= 


"match: GSS: Em:AQ126427" 


repeat region 


34058. 


.34566 




/note= 


"MLT1D repeat: matches 1. .505 of consensus" 


misc feature 


34228. 


.34906 




/note= 


"match: GSS: Em:AQ343317" 


misc_f eature 


34228. 


.34630 




/note= 


"match: GSS: Em: AQ122515" 



repeat regi On 


39473. 


.39776 






/note= 


"AluSq repeat: matches 5. 


. juo or consensus 


misc redture 


complement (39567 . .40054 ) 






/note— 


"match: GSS: Em: AQ707622" 




repeat region 




.39956 






/note— 


"MIR repeat: matches 57. . 


,200 of consensus" 


repeat region 


/I cop 


.41620 






/ not e = 


"MIR repeat: matches 49. , 


,144 of consensus" 


repeat region 




.42273 






/ not e— 


"L1MA5A repeat: matches 5977. .6290 of consensus 


repeat region 




. 42905 






/ not e= 


"MIR repeat: matches 49. . 


.146 of consensus" 


repeat region 


4 z y y o . 


. 43299 






/not e = 


"AluSx repeat: matches 3. 


.305 of consensus" 


repeat region 


4 o / 4 z . 


.44023 






/ not e= 


"L2 repeat: matches 2427. 


.2750 of consensus" 


repeat region 


10111 . 


.45206 






/note= 


"L2 repeat: matches 2643. 


.2741 of consensus" 


repeat region 


45492. 


.45622 






/note= 


"MLT1H repeat: matches 29, 


. .159 of consensus" 


misc_f eature 


45707. 


.45875 






/note= 


"match: GSS: Em: AQ793014 " 




repeat_region 


45878. 


. 45917 






/note= 


"20 copies 2 mer tt 77% conserved" 


repeat region 


45984 . 


.46283 






/note= 


"AluSx repeat: matches 1. 


.310 of consensus" 



Query Match 8.7%; Score 36.6; DB 9; Length 114396; 

Best Local Similarity 57.4%; Pred. No. 3.1; 

Matches 66; Conservative 0; Mismatches 49; Indels 0; Gaps 0; 

Qy 101 atgttacggcattattcaggacagaactttactggaacgtcctgtgttcaatgcattctg 160 

III I I I I I I I I I I II I I I I I I I I I I I I I II I I I I I 

Db 7 9780 ATGAGACGACTTTTCTCAGGACTGGACATGGTGGCTAAGACCTGTAATCCTAACACTTTG 79721 

Qy 161 ggaaaggaatgttgcagagtctgtgcagccaaggagatgcaaggccatttgtgga 215 

I II I I I I I I I II I I I I I I I I I I I I I I I II I I 
Db 7 9720 GGAGACATAAGTGGGAGGATCACTGGAGCCCAGGAGTTTTATACCAGTCTGGGCA 79666 



RESULT 7 
AC011696/C 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



AC011696 170869 bp DNA INV 23-MAR-2001 

Drosophila melanogaster , chromosome 2R, region 48A-48C, BAC clone 
BACR35F01, complete sequence. 
AC011696 

AC011696.4 GI:13435224 
HTG. 

fruit fly. 

Drosophila melanogaster 

Eukaryota; Metazoa; Arthropoda; Tracheata; Hexapoda; Insecta; 
Pterygota; Neoptera; Endopterygota; Diptera; Brachycera; 
Muscomorpha; Ephydroidea; Drosophilidae ; Drosophila . 
1 (bases 1 to 170869) 

Celniker, S . E . , Adams , M . D . , Kronmiller , B . , Tyler, D., Wan,K.H.,* 
Holt, R. A., Evans, C. A., Gocayne , J . D . , Amanatides , P . G . , Brandon, R.C. 
Rogers, Y., An , H . , Baldwin, D., Banzon,J., Beeson,K.Y., Busam, D . A. , 



TITLE 
JOURNAL 
REFERENCE 
AUTHORS 



TITLE 
JOURNAL 

COMMENT 



Carlson, J.W. , Center, A 
Dodson, K. , Dor sett , V. , 
Ferriera, S . , Frise, E . , 



FEATURES 

source 



., Champe,M., Davenport , L . B . , Dietz,S.M., 
Doup,L.E., Doyle,C, Dresnek,D., Farfan,D. 
Galle, R . F . , Garg,N.S., George, R. A., 



Gonzalez, M., Houck,J., Hoskins, R. A. , Hostin,D., Howland, T.J. 



Ibegwam,C. , Jalali,M. , 
Mcintosh, T.C. , Moy , M . , 
Pacleb, J. , Paragas, V. , 



BASE COUNT 
ORIGIN 



Kruse,D., Li, P., Mattei,B., Moshrefi,A., 
Murphy, B., Nelson, C, Nelson, K. A., Nunoo,J., 
Park,S., Patel,S., Pfeiffer,B., 
Phouanenavong, S . , Pittman, G . S . , Puri,V., Richards, S., Scheeler,F., 
Stapleton, M. , Strong,R., Svirskas,R., Tector,C, Williams, S .M. , 
Zaveri,J.S., Smith, H.O., Rubin, G.M. and Venter, J. C. 
Sequencing of Drosophila chromosome 2R,' region 48A-48C 
Unpublished 
2 (bases 1 to 170869) 

Celniker,S.E. , Agbayani,A., Arcaina, T . T . , Baxter, E., Blazej,R.G., 

Butenhof f , C. , Champe,M., Chavez, C, Chew,M., Ciesiolka, L . , 

Doyle, CM., Farfan,D.E., Galle, R., George, R. A., Harris, N.L., 

Hoskins, R. A. , Houston, K. A. , Hummasti, S . R. , Karra,K., Kearney, L., 

Kim, E., Lee,B., Lewis, S., Li, P., Lomotan, M . A. , Mazda, P., 

Moshref i, A.R. , Moshrefi,M., Nixon, K., Pacleb, J. M., Park,S., 

Pfeiffer,B., Poon,L., Sequeira,A., Sethi, H., Snir,E., 

Svirskas , R. R. , Wan,K.H., Weinburg,T., Zhang, R., Zieran,L.L. and 

Rubin, G.M. 

Direct Submission 

Submitted (ll-OCT-1999) Drosophila Genome Center, Lawrence Berkeley 

Laboratory, MS 64-121, Berkeley, CA 94720, USA 

On Mar 23, 2001 this sequence version replaced gi: 6119492. 

Sequence submitted by: 

Berkeley Drosophila Genome Project 

Lawrence Berkeley National Laboratory, MS 64-121 

Berkeley, CA 94720 

This sequence was assembled using end sequences from a whole genome 
shotgun and from subclones. of this BAC and its neighboring clones. 
For further information about this sequence, including its location 
and relationship to other sequences, please visit our sequence 
archive Web site (http://www.fruitfly.org/sequence/) or send email 
to bdgp@fruitfly.berkeley.edu. 

Location/Qualifiers 

1. .170869 

/organism="Drosophila melanogaster " 

/strain="y; cn bw sp" 

/db_xref="taxon:7227 M 

/ chromosome="2R" 

/map="48A-48C" 

/clone="BACR35F01 (D1156) " 

/clone__lib="RPCI-98 (Roswell Park Cancer Institute 
Drosophila melanogaster BAC library, partial EcoRI in 
pBACe3.6) " 
49941 a 35331 c 35461 g 50136 t 



Query Match 8.7%; 
Best Local Similarity 51.5%; 
Matches 84; Conservative 



Score 36.6; DB 3; 
Pred. No. 3.2; 
0; Mismatches 79; 



Length 170869; 
Indels 0; Gaps 



0; 



Qy 143 tgtgttcaatgcattctgggaaaggaatgttgcagagtctgtgcagccaaggagatgcaa 202 
I I I I I I I i I I I I I I I I I I I I I I I I I I I I I M I II II 



Db 4 5086 TGTCTTGAAGTCTTCCTGGGAAATGAAAATTGAATAAGACATACAGACAAAAATACAAAA 45027 



Qy 203 ggccatttgtggacgaagctgtgctgcaagtatctgactggggtttcagcctatctgaca 262 

II I I I I I I I I I I I I I I MM M II I M 

Db 4 502 6 TTAATGCTATTCAGGCAGCTGTTTGCATCGATTCCGAATAAAGTTTTTACCAATTT/^ACT 4 4 967 

Qy 263 tccaactgcagaagaaagaggctcaaggcttttttgaactcat 305 

I II I I I I I I I I I I I I I I I I II 
Db 4 4 966 TTAAATGTAATAAAAAAATAACTCAAATACTAATAAAGCTTAT 4 4 924 



RESULT 8 
AC007473/C 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
REFERENCE 
AUTHORS 



TITLE 
JOURNAL 

COMMENT 



AC007473 171831 bp DNA INV 28-FEB-2001 

Drosophila melanogaster , chromosome 2R, region 48A-48B, BAC clone 
BACR38D12, complete sequence. 
AC007473 

AC007473.10 GI:13162476 
HTG. 

fruit fly. 

Drosophila melanogaster 

Eukaryota; Metazoa; Arthropoda; Tracheata; Hexapoda; Insecta; 
Pterygota; Neoptera; Endopterygota ; Diptera; Brachycera; 
Muscomorpha; Ephydroidea; Drosophilidae; Drosophila . 

1 (bases 1 to 171831) 

Celniker,S.E., Adams, M.D., Kronmiller , B . , Tyler, D., Wan,K.H., 
Holt, R. A., Evans, C. A., Gocayne, J . D ; , Amanatides, P . G. , Brandon,R.C 
Rogers, Y., An , H . , Baldwin, D., Banzon,J., Beeson,K.Y., Busam, D . A. , 
Carlson, J. W. , Center, A., Champe,M., Davenport , L . B . , Dietz,S.M., 
Dodson,K., Dorsett,V., Doup,L.E., Doyle, C, Dresnek,D., Farfan,D. 

Galle, R. F. , Garg,N. S . , George, R. A. , 
Hoskins, R.A. , Host in, D. , Howland, T.J. , 
Kruse,D., Li, P., Mattei,B., Moshrefi,A., 
Murphy, B., Nelson, C, Nelson, K. A., Nunoo,J 
Park,S., Patel,S., Pfeiffer,B., 
Phouanenavong, S . , Pittman, G . S . , Puri,V., Richards, S., Scheeler,F. 
Stapleton,M. , Strong, R., Svirskas,R., Tector,C, Williams, S . M . , 
Zaveri,J.S., Smith, H.O., Rubin, G.M. and Venter, J. C. 
Sequencing of Drosophila chromosome 2R, region 48A-48B 
Unpublished 

2 (bases 1 to 171831) 

Celniker,S.E. , Agbayani,A., Arcaina , T . T . , Baxter, E., Blazej , R.G. , 
Butenhof f , C . , Champe,M., Chavez, C, Chew,M., Ciesiolka, L . , 
Doyle, CM., Farfan,D.E., Galle, R., George, R. A., Harris, N.L., 
Hoskins, R.A. , Houston, K. A. , Hummasti , S . R . , Karra,K., Kearney, L., 
Kim,E., Lee,B., Lewis, S., Li, P., Lomotan, M . A. , Mazda, P., 
Moshref i, A.R. , Moshrefi,M., Nixon, K. , Pacleb,J.M., Park,S., 
Pfeiffer,B., Poon,L., Sequeira,A., Sethi, H., Snir,E., 
Svirskas,R.R. , Wan,K.H., Weinburg,T., Zhang, R., Zieran,L.L. and 
Rubin, G.M. 
Direct Submission 

Submitted ( 05-MAY-1999) Drosophila Genome Center, Lawrence Berkel 

Laboratory, MS 64-121, Berkeley, CA 94720, USA 

On Feb 28, 2001 this sequence version replaced gi: 5670618. 

Sequence submitted by: 

Berkeley Drosophila Genome Project" 

Lawrence Berkeley National Laboratory, MS 64-121 



Ferriera, S . , Frise, E. , 
Gonzalez, M . , Houck, J. , 
Ibegwam,C., Jalali,M., 
Mcintosh, T.C. , Moy, M . , 
Pacleb,J., Paragas,V., 



Berkeley, CA 94720 

This sequence was assembled using end sequences from a whole genome 
shotgun and from subclones of this BAC and its neighboring clones. 
For further information about this sequence, including its location 
and relationship to other sequences, please visit our sequence 
archive Web site (http://www.fruitfly.org/sequence/) or send email 
to bdgp@fruitfly.berkeley.edu. 
FEATURES Location /Qua li fiers 

source 1. .171831 

/ organism="Drosophila melanogaster " 

/strain="y; cn bw sp M - 

/db_xref="taxon:7227" 

/ chromosome="2R" 

/map="4 8A-4 8B" 

/clone="BACR38D12 (D590)" 

/clone_lib="RPCI-98 (Roswell Park Cancer Institute 
Drosophila melanogaster BAC library, partial EcoRI in 
pBACe3. 6) " 

BASE COUNT 49756 a 35946 c 36141 g 49988 t 
ORIGIN 



Query Match 8.7%; Score 36.6; DB 3; Length 171831; 

Best Local Similarity 51.5%; Pred. No. 3.2; 



Matches 


84; Conservative 0; Mismatches 79; Indels 0; Gaps 


Qy 


143 


tgtgttcaatgcattctgggaaaggaatgttgcagagtctgtgcagccaaggagatgcaa 

1 1 1 1 1 II 1 1 1 1 1 1 1 111 i 1 1 1 II 1 1 1 1 II 1 1 1 II II 
T GT C T T G AAGTC T T C C T GGG AAAT G AAAAT T G AAT AAG AC AT AC AG AC AAAAAT AC AAAA 


202 


Db 


85405 


85346 


Qy 


203 


ggccatttgtggacgaagctgtgctgcaagtatctgactggggtttcagcctatctgaca 

II 1 1 II 1 1 1 t 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
TTAATGCTATTCAGGCAGCTGTTTGCATCGATTCCGAATAAAGTTTTTACCAATTTAACT 


262 


Db 


85345 


85286 


Qy 


263 


tccaactgcagaagaaagaggctcaaggcttttttgaactcat 305 

1 II 1 II 1 1 1 Mill 1 1 1 1 1 1 1 

T T AAAT GT AAT AAAAAAATAACTCAAAT ACT AAT AAAGCT TAT 85243 




Db 


85285 





RESULT 9 
AC023370/C 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 



AC023370 183215 bp DNA HTG 15-FEB-2001 

Homo sapiens clone RP11-20L19, WORKING DRAFT SEQUENCE, 11 unordered 

pieces . 

AC023370 

AC023370.4 GI:12831386 

HTG; HTGS_PHASE1; HTGS_DRAFT . 

human . 

Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 183215) 

Birren,B., Linton, L., Nusbaum,C. and Lander, E. 

Homo sapiens, clone RP11-20L19 

Unpublished 

2 (bases 1 to 183215) 

Birren,B., Linton, L., Nusbaum,C, Lander, E., Abraham, H., Allen, N., 



Anderson, S . , Baldwin, J., Barna,N., Beda,F., Bogus la vkiy, L . , 
Boukhgalter, B. , Brown, A., Burkett , G . , Campopiano, A . , Castle, A., 
Choepel,Y., Colangelo, M. , Collins, S., Collymore, A. , Cooke, P., 
DeArellano, K. , Dewar,K., Dodge, S., Domino, M., Doyle, M., 
Fenestor,J., Ferreira,P., FitzHugh,W., Forrest, C, Gage,D., 
Galagan,J., Gardyna,S., Ginde,S., Goyette,M., Graham, L., 
Grand-Pierre, N. , Grant, G., Hagos,B., Heaford,A., Horton,L., 
Howland, J.C. , Iliev,I., Johnson, R., Jones, C, Kann,L., Karatas,A., 
Klein, J., Landers, T., Largocque, K. , Lehoczky, J . , Levine,R., 
Lieu,C, Liu, G . , Locke, K., Macdonald, P . , Marquis, N., McCarthy, M., 
McEwan,P., McGurk, A. , McKernan,K., McPheeters , R . , Meldrim,J., 
Meneus,L., Mihova,T., Miranda, C, Mlenga,V., Morrow, J., Naylor,J., 
Norman,C.H., 0'Connor,T., 0 1 Donnell, P . , 0'Neil,D., 01ivar,T.M., 
Peterson, K., Pierre, N., Pisani,C, Pollara,V., Raymond, C, 
Riley, R., Rogov,P., Rothman,D., Roy, A., Santos, R. , Schauer,S., 
Severy, P. , Spencer, B. , Stange-Thomann, N . , Stoj anovic, N . , 
Subramanian, A. , Talamas,J., Tesfaye,S., Theodore, J., Tirrell,A., 
Travers,M., Trigilio,J., Vassiliev, H . , Viel,R., Vo,A., Wilson, B., 
Wu,X., Wyman,D., Ye,W.J., Young, G. , Zainoun,J., Zimmer,A. and 
Zody,M. 

TITLE Direct Submission 

. JOURNAL Submitted ( 14-FEB-2000 ) Whitehead Institute/MIT Center for Genome 
Research, 320 Charles Street, Cambridge, MA 02141, USA 
COMMENT On Feb 15, 2001 this sequence version replaced gi: 10280847. 

All repeats were identified using RepeatMasker : 
Smit, A.F.A. & Green, P. (1996-1997) 

http: //ftp. genome . Washington . edu/RM/RepeatMasker . html 
Genome Center 

Center: Whitehead Institute/ MIT Center for Genome Research 

Center code: WIBR 

Web site: http://www-seq.wi.mit.edu 

Contact : sequence__submissions@genome . wi .mit . edu 

Project Information 

Center project name: L3926 
Center clone name: 20_L_19 

Summary Statistics 

Sequencing vector: M13; M77815; 47% of reads 
Sequencing vector: Plasmid; n/a; 53% of reads 
Chemistry: Dye-terminator Big Dye; 100% of reads 
Assembly program: Phrap; version 0.960731 
Consensus quality: 173128 bases at least Q40 
Consensus quality: 177123 bases at least Q30 
Consensus quality: 179772 bases at least Q20 
Insert size: 170000; agarose-fp 
Insert size: 182215; sum-of -contigs 
Quality coverage: 6.8 in Q20 bases; agarose-fp 
Quality coverage: 6.3 in Q20 ba . 

* NOTE: This is a 'working draft' sequence. It currently 

* consists of 11 contigs. The true order of the pieces 

* is not known and their order in this sequence record is 

* arbitrary. Gaps between the contigs are represented as 

* runs of N, but the exact sizes of the gaps are unknown. 

* This record will be updated with the finished sequence 

* as soon as it is available and the accession number will 

* be preserved . 

* 1 2147: contig of 2147 bp in length 

* 2148 2247: gap of 100 bp 
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of 
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A C "7 1 
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cont ig 
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7517: gap 


of 




TRIP 


8558: 


contig 




ft R ^ Q 


8658: gap 


of 


* 


.ft £ R Q 

'o do y 


9869: 


contig 




QR7 n 


9969: gap 


of 




QQ7 n 


12680: 


contig 


* 


12681 


12780: gap of 




12781 


52103: 


contig 


★ 


52104 


52203: gap of 


★ 


52204 


103513: 


contig 


* 


103514 


103613: gap of 




103614 


183215: 


contig 



100 bp 
>f 1110 bp 

100 bp 
>f 1250 bp 

100 bp 
)f 1397 bp 

100 bp 
)f 1041 bp 

100 bp 
)f 1211 bp 

100 bp 
)f 2711 bp 

100 bp 
)f 39323 k 

100 bp 
>f 51310 I 
100 bp 



in length 
in length 
in length 
in length 
in length 
in length 
in length 
d in length 
d in length 
d in length. 



FEATURES 

source 



misc_f eature 

misc_f eature 
misc_f eature 
misc_f eature 
misc_f eature 
misc__f eature 
misc_f eature 
misc_f eature 
misc_f eature 
misc_f eature 
misc feature 



BASE COUNT 590 
ORIGIN 



Location/Qualifiers 
1. .183215 

/organism="Homo sapiens " 
/db_xref="taxon: 9606" 
/clone="RPll-20L19" 

/clone__lib="RPCI-ll Human Male BAC" 
1. .2147 

/note=" as sembly_f ragment 
clone_end: SP6 
vector_side : left" 
2248. .3360 

/note= " as sembly_f ragment " 
3461. .4570 

/note= " as sembly_f ragment" 
4671. .5920 

/note= "as sembly_f ragment" 
6021. .7417 

/note="assembly_f ragment" 
7518. .8558 

/note="assembly_f ragment " 
8659. .9869 

/note=" as sembly_f ragment " 
9970. .12680 

/note="assembly_f ragment" 
12781. .52103 
/note= " as sembly_f ragment " 
52204. .103513 
/note= "a ssembly_f ragment " 
103614. .183215 
/note= " as sembly_f ragment 
clone_end:T7 
vector_side : right" 
16 a 32076 c 31329 g 59794 t 1000 others 



Query Match 8.7%; Score 36.6; DB 2; Length 183215; 

Best Local Similarity 60.6%; Pred. No. 3.3; 



Matches 60; Conservative 0; Mismatches 39; Indels 0; Gaps 0; 



Qy 310 tctctgttcaatcatgctgaaaaacagtgggtgggatttctgggcccaatacatatatcg 369 

I I I I I I I III II I I I I I I II I I I I I I I I I I I III 

Db 19667 TCTATGTTTCTAGATGGGGACCAACTGTGACTAGTTATTTTGGAGCAAATCCAGCTAACT 19608 

Qy 370 caggggatagatgaccgagtgatctcgccctcagtggca 408 

I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 19607 CAAGGAATACTCAAACCAGTGATAGGGTCCTCAGTAGCA 19569 



RESULT 10 
AE003825/C 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



AE003825 278196 bp DNA INV 04-OCT-2000 

Drosophila melanogaster genomic scaffold 142000013386047 section 18 
of 52, complete sequence. 
AE003825 AE002787 
AE003825.2 GI:10727634 
HTG . 

fruit fly. 

Drosophila melanogaster 

Eukaryota; Metazoa; Arthropoda; Tracheata; Hexapoda; Insecta; 
Pterygota; Neoptera; Endopterygota; Diptera; Brachycera; 
Muscomorpha; Ephydroidea; Drosophilidae ; Drosophila. 
1 (bases 1 to 278196) 

Adams, M.D., Celniker, S . E . , Holt, R. A., Evans, C. A., Gocayne, J . D . , 
Amanatides, P.G. , Scherer , S . E . , Li,P.W., Hoskins , R. A. , Galle,R.F., 
George, R. A., Lewis, S.E., Richards, S., Ashburner , M . , Henderson, S . N . , 
Sutton, G.G., Wortman, J.R. , Yandell, M . D . , Zhang, Q., Chen,L.X., 
Brandon, R.C. , Rogers,Y.H., Blazej,R.G., Champe,M., Pf eif f er , B . D . , 
Wan,K.H., Doyle, C, Baxter, E.G., Helt,G., Nelson, C.R., Gabor 
Miklos,G.L., Abril,J.F., Agbayani,A., An,H.J., 
Andrews-Pf annkoch, C . , Baldwin, D., Ballew,R.M., Basu,A., 
Baxendale, J . , Bayraktaroglu, L . , Beasley, E .M . , Bees on, K. Y . , 
Benos,P.V., Berman,B.P., Bhandari,D., Bolshakov, S . , Borkova,D., 
Botchan,M.R. , Bouck,J., Brokstein, P . , Brottier,P., Burtis,K.C, 
Busam, D . A . , Butler, H., Cadieu,E., Center, A., Chandra, I., 
Cherry, J. M., Cawley,S., Dahlke,C, Davenport , L . B . , Davies,P., de 
Pablos,B., Delcher,A., Deng,Z., Mays, A. D., Dew, I., Dietz,S.M., 
Dodson,K., Doup,L.E., Downes,M., Dugan-Rocha, S . , Dunkov,B.C, 
Dunn, P., Durbin,K.J., Evangelista , C . C . , Ferraz,C, Ferriera,S., 
Fleischmann, W . , Fosler,C, Gabrielian, A. E . , Garg,N.S., 
Gelbart, W.M. , Glasser,K., Glodek,A., Gong,F., Gorrell , J . H . , Gu,Z., 
Guan,P., Harris, M,, Harris, N.L., Harvey, D., Heiman,T.J., 
Hernandez, J . R. , Houck,J., Hostin,D., Houston, K. A.., Howland, T . J . , 
Wei,M.H., Ibegwam,C., Jalali,M., Kalush,F., Karpen,G.H., Ke,Z., 
Kennison, J. A. , Ketchum, K . A. , Kimmel,B.E., Kodira,C.D., Kraft, C, 
Kravitz,S., Kulp,D., Lai,Z., Lasko,P., Lei,Y., Levitsky, A. A. , 
Li, J., Li,Z., Liang, Y., Lin,X., Liu,X., Mattei,B., Mcintosh, T . C . , 
McLeod,M.P., McPherson, D . , Merkulov,G., Milshina, N . V . , Mobarry,C, 
Morris, J., Moshrefi,A., Mount, S.M., Moy,M., Murphy, B., Murphy, L., 
Muzny,D.M., Nelson,D.L., Nelson, D.R., Nelson, K. A., Nixon, K., 
Nusskern, D . R. , Pacleb,J.M., Palazzolo, M . , Pittman, G . S . , Pan,S., 
Pollard, J., Puri,V., Reese, M.G., Reinert,K., Remington, K. , 
Saunders, R. D . , Scheeler,F., Shen,H., Shue,B.C, Siden-Kiamos, I . , 
Simpson, M . , Skupski , M . P . , Smith, T., Spier, E., Spradling, A. C . , 
Stapleton, M. , Strong, R., Sun,E., Svirskas,R., Tector,C, Turner, R. , 



Venter, E., Wang, A. H., Wang,X., Wang,Z.Y., Wassarman, D . A. , 
Weinstock, G . M . , Weissenbach, J . , Williams , S . M . , Woodage, T . , 
Worley,K.C, Wu,D., Yang,S., Yao, Q. A. , Ye, J., Yeh, R . F . , 
Zaveri,J.S., Zhan,M., Zhang,G., Zhao,Q., Zheng, L., Zheng, X.H., 
Zhong,F.N., Zhong,W., Zhou,X., Zhu,S., Zhu,X., Smith, H . 0 . , 
Gibbs,R.A., Myers, E.W., Rubin, G . M . and Venter, J. C. 

TITLE The genome sequence of Drosophila melanogaster 

JOURNAL Science 287 (5461), 2185-2195 (2000) 

MEDLINE 20196006 
REFERENCE 2 (bases 1 to 278196) 

AUTHORS Adams, M.D., Celniker , S . E . , Gibbs,R.A., Rubin, G.M. and Venter, C.J. 

TITLE Direct Submission 

JOURNAL Submitted ( 21-MAR-2000 ) Celera Genomics, 45 West Gude Drive, 
Rockville, MD, USA 
COMMENT On Oct 9, 2000 this sequence version replaced gi: 7303570. 

FEATURES Location/Qualifiers 
source 1. .278196 

/organism=" Drosophila melanogaster" 
/db_xref ="taxon : 7227 " 
/chromosome= f, 2R" 
mRNA join(6421. .7995,36785. .3694 5,37365. .37899) 

/ gene=" inv" 

/note="Nucleotide sequence of the Celera sequence differs 

from the published sequence for this transcript." 

/product="CT39612" 

/db_xref ="FLYBASE : FBan0017835" 

/db_xr e f = " FLYBAS E : FBgnO 0 012 69" 
gene <6421. .>37899 

/gene="inv" 

/note="CG17835" 

/map="47F15-48Al" 

/db_xr e f = " FLYBASE : FBanO 0 17 835" 

/ db_xr e f = " FLYBASE : FBgnO 0 012 69" 
CDS join(6692. .7995,36785. .3694 5,37365. .37606) 

/gene=" inv" 

/note="inv gene product; Nucleotide sequence of the Celera 
sequence differs from the published sequence for this 
transcript" 
/codon_start=l 

/db_x re f =" FLYBAS E : FBanO 0 17835" 
/ db_x r e f = " FLYBAS E : FBgn 0 0 0 1 2 6 9 " 
/protein_id="AAF58640. 1" 
/db_xref="GI : 7303587 " 

/translation="MSTLASTRPPPLKLTIPSLEEAEDHAQERRAGGGGQEVGKMHPD 
CLPLPLVQPGNSPQVREEEEDEQTECEEQLNIEDEEVEEEHDLDLEDPASCCSENSVL 
SVGQEQSEAAQAALSAQAQARQRLLISQI YRPSAFSSTATTVLPPSEGPPFSPEDLLQ 
LPPSTGTFQEEFLRKSQLYAEELMKQQMHLMAAARVNALTAAAAGKQLQMAMAAAAVA 
TVPSGQDALAQLTATALGLGPGGAVHPHQQLLLQRDQVHHHHHMQNHLNNNENLHERA 
LKFSIDNILKADFGSRLPKIGALSGNIGGGSVSGSSTGSSKNSGNTNGNRSPLKAPKK 
SGKPLNLAQSNAAANSSLSFSSSLANICSNSNDSNSTATSSSTTNTSGAPVDLVKSPP 
PAAGAGATGASGKSGEDSGTPIVWPAWVYCTRYSDRPSSGESKSTSAKAQEAGDVQFG 
GRWWGWGRREGGGRRWGRGAGGQKAANGLQRNAVGQTEARVQRESLSDGEATPAAERG 
TGTERGADQDLVPEQTGQAEKVERHQESAGAAADGAGIVQPLDDTADPRGGGAAGAAG 
GG" 

mRNA complement (join(54356. .54366,54769. .55563,55844. .55941, 

57075. .58578)) 
/gene="en" 



/note="Nucleotide sequence of the Celera sequence differs 

from the published sequence for this transcript." 

/product="CT25904" 

/ db_x ref=" FLYBAS E : FB a n 0 0 0 90 1 5 " 

/ db_xref=" FLYBASE : FBgn0000577 " 

complement (<54356. .>58578) 

/gene="en" 

/note="CG9015" 

/map="4 8Al-4 8A3" 

/ db_x r e f - " FLYBAS E:FBan0009015 n 

/ db_x r e f = " FLYBAS E:FBgn0000577" 

complement (join (55315. .55563, 5584 4 . . 55 941, 57 07 5 . .5838 6) ) 
/gene="en" 

/note="en gene product; Nucleotide sequence of the Celera 
sequence differs from the published sequence for this 
transcript" 
/codon_start=l 

/ db__xr e f = " FLYBASE : FBanO 0 0 9 0 1 5 " 
/ db_xr e f = " FLYBASE : FBgn 0 0 0 0 5 7 7 " 
/protein_id="AAF58639. 1" 
/db_xref ="GI : 7303586" 

/trans lation="MALEDRCSPQSAPSPITLQMQHLHHQQQQQQQQQQQMQHLHQLQ 

QLQQLHQQQLAAGVFHHPAMAFDAAAAAAAAAAAAAAHAHAAALQQRLSGSGSPASCS 

TPASSTPLTIKEEESDSVIGDMSFHNQTHTTNEEEEAEEDDDIDVDVDDTSAGGRLPP 

PAHQQQSTAKPSLAFSISNILSDRFGDVQKPGKSMENQASIFRPFEASRSQTATPSAF 

TRVDLLEFSRQQQAAAAAATAAMMLERANFLNCFNPAAYPRIHEEIVQSRLRRSAANA 

VIPPPMSSKMSDANPEKSALGSLCKAVSQIGQPAAPTMTQPPLSSSASSLASPPPASN 

ASTISSTSSVATSSSSSSSGCSSAASSLNSSPSSRLGASGSGVNASSPQPQPIPPPSA 

VSRDSGMESSDDTRSETGSTTTEGGKNEMWPAWVYCTRYSDRPSSGPRYRRPKQPKDK 

TNDEKRPRTAFSSEQLARLKREFNENRYLTERRRQQLSSELGLNEAQIKIWFQNKRAK 

IKKSTGSKNPLALQLMAQGLYNHTTVPLTKEEEELEMRMNGQIP" 

complement (join (<110327 . . 110528 , 11058 9 . . 110699, 

110765. .110886,110951. .111578,111665. .112114, 

112344. .112861,113731. .11392 6,113977. .114194, 

114258. .116277,116383. .117741,117813. .117854, 

118254. .118298,118362. .118427,1184 87. .118843, 

119079. .119351,119414. .120656,120719. .121341, 

121403. .121870,122401. .122 64 6,122709. .122835)) 

/gene="CG10897" 

/product="CT30517" 

/db_xre f=" FLYBASE : FBanOOl 08 97" 

/db_xref=" FLYBASE : FBgn0033636" 

complement (<110327 . ,>122835) 

/gene="CG10897" 

/map="4 8A3-4 8A5" 

/db_xref=" FLYBASE: FBan0010897" 

/db_xref=" FLYBASE : FBgn0033 63 6" 

complement (join (121527 . .12187 0, 122401. .122 64 6, 

122709. .122835)) 

/gene="CG10897" 

/note="CG10897 gene product" 

/codon_start=l 

/db_xref=" FLYBASE : FBan001089'7 " 
/ db_xref =" FLYBASE : FBgn0033636" 
/protein_id-"AAF58638 .1" 
/db_xref ="GI : 7303585" 

/trans lation="MNKNAGDGSDGKNSNKNSNAGGGGGAGGPHDPTGLLDAASLFAY 



mRNA 



gene 



CDS 



mRNA 



WGRDPTGAAAAAASNPLFNSQFNAAAAAGLGLLPQAGGASANDRYSMAAAAAAAAGAH 
HHQNTMAVAASQAASLAGLHPAISCPGLLQSPASLGSFPLSTSRSGGCLWTSWDGNGR 
RCWWSGCWRWWSRCTRPGRRWKWRKWQRWWWILIGFFRQQVKQVAQGEASRPAATTAT 
AKSGQQS KCGS CGGS S S S SQ " 

complement (join (147 64 6. . 148 170 , 148232 . . 14 8556, 

148618. .>149725) ) 

/gene="CG9006" 

/product="CT25884" 

/db_x re f-" FLYBASE : FBan000900 6" 

/db_xref="FLYBASE:FBgn0033637" 

complement (<147646. ,>149725) 

/gene="CG9006" 

/map="48Bl-48Bl" 

/ db_xref =" FLYBASE : FBan0009006" 

/db_xref =" FLYBASE : FBgn0033637 " 

complement (join (14 7 646. . 148170, 148232 . .148556, 

148618. .149687)) 

/gene="CG9006" 

/note="CG9006 gene product" 

/codon_start=l 

/db_xre f =" FLYBASE : FBanO 0 0 90 0 6 " 
/db_xref=" FLYBASE : FBgn0033637 " 
/protein_id="AAF58637 .1" 
/db_xref="GI: 7303584" 

/trans la tion="MRPNLFSGASRLLTYSRNGKLLTRGRSTKATSSSLDSQHQDAAT 
TEGGRAESVEESPEQQRKLPTREPLAKNFFIGVVDKELLAYPEVIPRDEMAQLENSLL 
PLKNYFVEPRETEETSPETLRQLGLYGLNVSTDYEGKGYGWSASLMASEPDSTDINVT 
LGLQTHRVVVDLLKEVGTPLQQQRYLQDLATGKLIGTEAIYEISPPEEDYFNTTAELF 
PEYGKWQLNGEKSFVICTPGERQLFLVLAQTQQPNVPGVLGRGTTIFLVDSQQEGVRL 
GEKHATFGCRKAEIRRVHFEGVKLGEDQVVGLPHDGNRYSEQLVRSSRLRGSLVGLSL 
AKKLLNELAQYTVNTTQCGVQLQDLELTRIHMSRAMCSVYAMESMLYLTAGLLDEFRA 
QDVTLESAITKYFTLRQVYAIASQNLGVVGPKSLLSGETTELGLRDAAQLCTQGESLD 
TLGMFIALTGLQHAGQAMNTGVRKSRNPLFNPGHIFGKFLDNNSIDNPKTKMQLSEHV 
HPSLEAAAQCIELSVARLQMAVELMFTKHGNAWERQSEMQRLAEVGTLIYAMWASVA 
RASRSYCIGLPLADHELLTATAICSEGRDRVRTLCTEI YGGHFVNNDNNLVRLSKQVA 
KS KG Y FAVH PLT FN F " 

complement (join (1515 61 . . 1517 5 6, 151832 . .152133, 

152333. .153187,153247. .154 618,154 680. .154816, 

154877. .156011, 157113'. .157127, 1572 94. .157318, 

162657. .>163284) ) 

/gene="CG9005" 

/product="CT25874" 



Query Match 8.7%; Score 36.6; DB 3; 

Best Local Similarity 51.5%; Pred. No. 3.4; 
Matches 84; Conservative 0; Mismatches 79; 



Length 278196; 
Indels 0; Gaps 



0; 



Qy 14 3 tgtgttcaatgcattctgggaaaggaatgttgcagagtctgtgcagccaaggagatgcaa 202 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II 

Db 159599 TGTCTTGAAGTCTTCCTGGGAAATGAAAATTGAATAAGACATACAGACAAAAATACAAAA 159540 

Qy 203 ggccatttgtggacgaagctgtgctgcaagtatctgactggggtttcagcctatctgaca 262 

II I I I II I II I I I I I I II I I II I I I I I 

Db 159539 TTAATGCTATTCAGGCAGCTGTTTGCATCGATTCCGAATAAAGTTTTTACCAATTTAACT 1594 80 



Qy 2 63 tccaactgcagaagaaagaggctcaaggcttttttgaactcat 305 
I II I I I I I I I I I I I I I I I I I I 



Db 159479 TTAAATGTAATAAAAAAATAACTCAAATACTAATAAAGCTTAT 159437 



RESULT 11 

SAU02975 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 



JOURNAL 
MEDLINE 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



FEATURES 

source 



5 1 UTR 
CDS 



3'UTR 
BASE COUNT 
ORIGIN 



SAU02975 3911 bp mRNA VRT 29-OCT-1993 

Squalus acanthias proteolipid protein DM gamma mRNA, complete cds . 
U02975 

U02975.1 GI:409973 

spiny dogfish. 
Squalus acanthias 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Chondrichthyes ; 
Elasmobranchii; Squalea; Squaloidei; Squalidae; Squalus. 

1 (bases 1 to 3911) 

Kitagawa,K., Sinoway, M . P . , Yang,C, Gould, R.M. and Colman,D.R. 
A proteolipid protein gene family: expression in sharks and rays 
and possible evolution from an ancestral gene encoding a 
pore-forming polypeptide 
Neuron 11 (3), 433-448 (1993) 
94000810 

2 (bases 1 to 3911) 
Colman,D.R. 

Direct Submission 

Submitted (28-OCT-1993) David R. Colman, Brookdale Center for 
Molecular Biology, Mount Sinai School of Medicine, 1 Gustave L. 
Levy Place, New York, NY 10029, USA 

Location /Qualifiers 

1. .3911 

/organism="Squalus acanthias" 
/db_xref ="taxon : 77 97 " 
/clone="DM beta" 
/tissue_type="brain" 
/dev_stage=" adult" 
1. .90 
91. .831 
/codon_start=l 

/product="proteolipid protein DM gamma" 
/protein_id="AAC59641 . 1" 
/db_xref="GI: 409974" 

/trans la tion="MGCFECCIKCLGGVPYASLLATILCFSGVALFCGCGHVALTKVE 
RIVQLYFSNNASDHVLLTDVIQMMHYVIYGVASFSFLYGIILLAEGFYTTSAVKEIHG 
EFKTTVCGRCISGMSVFLTYLLGIAWLGVFGFSAVPAFIYYNMWSACQTISSPPVNLT 
TVIEEICVDVRQYGIIPWNASPGKACGSTLTTICNTSEFDLSYHLFIVACAGAGATVI 
ALLI YMMATTYNFAVLKFKSREDCCTKF" 
832. .3911 
1201 a 777 c 765 g 1168 t 



Query Match 8.6%; Score 36.4; DB 5; Length 3911; 

Best Local Similarity 53.5%; Pred. No. 2.6; 

Matches 76; Conservative 0; Mismatches 66; Indels 0; Gaps 0; 

Qy 154 cattctgggaaaggaatgttgcagagtctgtgcagccaaggagatgcaaggccatttgtg 213 

I I I I I I I I I Mill I I I I I I II II I I I I I I III 

Db 34 5 CCTCCTGGCAGAAGGGTTTTACACCACAAGCGCTGTTAAGGAGATTCATGGTGAGTTCAA 4 04 



Qy 214 gacgaagctgtgctgcaagtatctgactggggtttcagcctatctgacatccaactgcag 27 3 

III I I I I I I I I III I I I I I I I I I I I I I I I I 
Db 4 05 AACAACTGTGTGTGGACGCTGCATCAGTGGAATGTCTGTCTTTCTGACCTACCTGTTGGG 4 64 

Qy 274 aagaaagaggctcaaggctttt 295 

III I I I I I I I I I I 
Db 4 65 AATAGCGTGGCTGGGAGTTTTT 4 86 



RESULT 12 
AL589684/C 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



Craniata; Vertebrata; Euteleostomi ; 
Catarrhini; Hominidae; Homo. 



AL589684 94555 bp DNA PRI 06-APR-2001 

Human DNA sequence from clone RP11-437J19 on chromosome 6, complete 
sequence . 
AL589684 

AL589684.7 GI:13561020 

HTG. 

human . 

! Homo sapiens 

Eukaryota; Metazoa; Chordata; 
Mammalia; Eutheria; Primates; 
1 (bases 1 to 94555) 
Dunn, M. 

Direct Submission 

Submitted ( 06-APR-2001 ) Sanger Centre, Hinxton, Cambridgeshire, 
CB10 ISA, UK. E-mail enquiries: humquery@sanger.ac.uk Clone 
requests : clone request® s anger .ac.uk 

On Apr 8, 2001 this sequence version replaced gi: 13398844. 
During sequence assembly data is compared from overlapping clones. 
Where differences are found these are annotated as variations 
together with a note of the overlapping clone name. Note that the 
variation annotation may not be found in the sequence submission 
corresponding to the overlapping clone, as we submit sequences with 
only a small overlap as described above. 

This sequence was finished as follows unless otherwise noted: all 
regions were either double-stranded or sequenced with an alternate 
chemistry or covered by high quality data (i.e., phred quality >= 
30) ; an attempt was made to resolve all sequencing problems, such 
as compressions and repeats; all regions were covered by at least 
one plasmid subclone or more than one M13 subclone; and the 
assembly was confirmed by restriction digest. The following 
abbreviations are used to associate primary accession numbers given 
in the feature table with their source databases: Em:, EMBL; Sw:, 
SWISSPROT; Tr:, TREMBL; Wp : , WORMPEP; Information on the WORMPEP 
database can be found at 

http: //www. sanger.ac.uk/Projects/C_elegans/wormpep This sequence 
was generated from part of bacterial clone contigs of human 
chromosome 6, constructed by the Sanger Centre Chromosome 6 Mapping 
Group. Further information can be found at 
http: //www. Sanger . ac . uk/HGP/Chr6 

RP11-437J19 is from the library RPCI-11.2 constructed by the group 
of Pieter de Jong. For further details see 
http : //www . chori . org/bacpac/home . htm 
VECTOR: pBACe3.6 

IMPORTANT: This sequence is not the entire insert of clone 
RP11-437J19 It may be shorter because we sequence overlapping 



sections only once, except for a 100 base overlap. 

The true left end of clone RP1-124C6 is at 94456 in this sequence. 
The true right end of clone RP1-132N8 is at 100 in this sequence. 
FEATURES Location/Qualifiers 
source 1. .94555 

/organism="Homo sapiens" 

/db_xref="taxon: 9606" 

/chromosome="6" 

/clone="RPll~437J19" 

/clone_lib="RPCI-ll . 2" 
repeat_region 5. .224 

/note="MERlB repeat: matches 1. .222 of consensus" 
repeat_region 225. .469 

/note="LlMC3 repeat: matches 74 60. .7722 of consensus" 
repeat_region 1509. .2925 

/note="LlM4 repeat: matches 3213. .4663 of consensus" 
repeat_region 2927. .3676 

/note="LlPA7 repeat: matches 5358. .6137 of consensus" 
repeat_region 3699. .5060 

/note="LlMEc repeat: matches 2212. .3224 of consensus" 
repeat_region 5298. .5729 

/note="LlM4c repeat: matches 1546. .1973 of consensus" 
repeat_region 5812. .6559 

/note="LlM4c repeat: matches 615. .1388 of consensus" 
repeat_region 6560. .6952 

/note="MSTA repeat: matches 1. .426 of consensus" 
repeat_region 6953. .7256 

/note="LlM4c repeat: matches 318. .615 of consensus" 
repeat_region 7334. .7730 

/note="LlMB8 repeat: matches 5760. .6165 of consensus" 
repeat_region 9532. .9885 

/note="THElC repeat: matches 1. .371 of consensus" 
repeat_region 11134. .11181 

/note="24 copies 2 mer ta 83% conserved" 
repeat__region 11138. .11181 

/note="ll copies 4 mer tata 86% conserved" 
repeat_region 11182, .11255 

/note="37 copies 2 mer at 71% conserved" 
repeat_region 11218. .11257 

/note="10 copies 4 mer atat 90% conserved" 
repeat_region 13859. .16586 

/note="LlPA13 repeat: matches 3426. .6153 of consensus" 
repeat_region 16647. .16971 

/note="L2 repeat: matches 2340, 
repeat_region 17554. .17737 

/note="MIR repeat: matches 49. 
repeat_region 18069. .18376 

/note="AluSx repeat: matches 1, 
repeat_region 18474. .18699 

/note="MIR repeat: matches 11. 
repeat_region 18842. .18885 

/note="ll copies 4 mer caca 88% conserved" 
repeat_region 19896. .20063 

/note="MER77 repeat: matches 445. .631 of consensus" 
repeat_region 20073. .20108 

/note="MER77 repeat: matches 274. .308 of consensus" 
repeat region 20109. .20345 



.2750 of consensus" 
,233 of consensus" 

.308 of consensus" 
,261 of consensus" 







/ note— 


"MER46A repeat: matches 1. .235 of consensus" 


repeat 


region 


zu o4 b . 


0 n a "5 tr 
. ZU4 OJ 






/ note— 


iXihjK / / repeat,: matcnes 101 . . z/4 or consensus 


repeat 


region 


01 CQQ 

z 1 00 y . 


. Z Z U OU 






/ note= 


LirAlo repeat: matcnes o//y. . bijo or consensus 


repeat 


region 


z Z Uoz . 


001 1 T 

. zz 1 1 / 






/ not e = 


"9 copies 4 mer aaat 86% conserved" 


repeat 


region 


Z Z Z 0 Z . 


. z o i y o 






/ not e= 


LiiLYiuz repeat, matcnes o4U4. . dj£j or consen&uo 


repeat 


region 


0 0 0 n q 


. z o oz y 






/ not e= 


"11 copies 2 mer ca 100% conserved" 


repeat 


r e gi on 


9 0 0 c c 


9 "3 4 9 7 
. Z J 4 Z / 






/ not e= 


Liz repeat: matcnes zooy. . zoyo or consensus 


repeat 


region 


ZQDZo . 


. z4 oUU 






/ no te= 


"MIR repeat: matches 4. .262 of consensus" 


repeat 


region 


O A Q O "3 
Z 4 OZ O . 


. ZdZ 1 4 






/ not e~ 


MbK/413 repeat: matcnes i/o. . ozz or consensus 


repeat 


region 


Z DoZ 0 . 


. zo4 Jo 






/ not e= 


ixihjK/4A. repeat: matcnes i. .110 or consensus 


repeat 


region 


z bUol . 


. ZoZZl 






/ note= 


rihiKoA repeat: matcnes z. .loy or consensus 


repeat 


region 


Z / OOO . 


OH A AO 

. Z / 4 4 z 






/ not e= 


"L1PA4 repeat: matches 6055. .6144 of consensus" 


repeat 


region 


ZoUbO. 


. z o u yz 






/ not e= 


"14 copies 2 mer tg 92% conserved" 


repeat 


region 


z O 0 1 1 . 


. z o o d y 






/note= 


Lz repeat: matcnes z/UU. .z/4o or consensus 


repeat 


region 


n Q Q c c 
Z O O D 0 . 


. z yuoo 






/note= 


"MLTU repeat: matches 1. .191 of consensus" 


repeat 


r egi on 


OQ1 CO 

z y 1 o 0 . 


0 QOll 

. z y z / / 






/ not e= 


"60 copies 2 mer ag 64% conserved" 


repeat 


region 


z y z / y . 


o on n p 
. z y / u o 






/ not e= 


MLizL-D repeat: matcnes i. .400 or consensus 


repeat 


region 


z y / 1 d . 


. z y o 4 u 






/ note= 


"MLTU repeat: matches 385. .516 of consensus" 


repeat 


region 


ou y 4 1 . 


. Ol DO O 






/ not e= 


"L1M4 repeat: matches 4877. .5642 of consensus" 


repeat 


region 


oz U 1 4 . 


. z>Z 1 u / 






/note— 


"77 copies 2 mer tt 74% conserved" 


repeat 


region 


ozUi y . 


. oz lo4 






/note= 


"4 copies 34 mer 76% conserved" 


repeat 


region 


oz U 4 0 . 


•501 AO 

. oZ lUo 






/note= 


"16 copies 4 mer tttc 100% conserved" 


repeat 


region 


oz 11 y . 


. 5Z 1 / U 






/ note= 


"13 copies 4 mer ttct 98% conserved" 


repeat 


region 


3217 0. 


. 3z 4 / / 






/note= 


"AluSc repeat: matches 1. .309 of consensus" 


repeat 


region 


oz y 1 0 . 


o o n c /i 
; OZ yo 4 






/ not e= 


"20 copies 2 mer ac 82% conserved" 


repeat 


region 


0 0 0 0 n 


. OOOOO 






/ note= 


"AluJb repeat: matches 3. .303 of consensus" 


repeat_ 


region 


34082. 


.34285 






/note= 


"6 copies 34 mer 71% conserved" 


repeat 


__region 


34083. 


. 34274 






/note= 


"48 copies 4 mer aaag 71% conserved" 


repeat_ 


_region 


35357. 


.35409 






/note= 


"MER5A repeat: matches 16. .65 of consensus" 



repeat_region 
repeat_region 
repeat_region 
repeat_region 
repeat_region 
repeat__region 
repeat_region 
repeat_region 
repeat_region 
repeat_region 
repeat_region 
repeat__region 
repeat_region 
repeat_region 
repeat_region 
repeat_region 
repeat_region 
repeat_region 
repeat_region 

repeat_region 



35499. .35516 

/note="MER5A repeat: matches 65. .184 of consensus" 
35547. .36313 

/note="LlMEc repeat: matches 583. .1464 of consensus" 
38689. .38890 

/note="L2 repeat: matches 2544. .2750 of consensus" 
39096. .39409 

/note="AluY repeat: matches 1. .311 of consensus" 
39875. .40054 

/note="MLTlD repeat: matches 52. .218 of consensus" 
40055. .40409 

/note="THElB repeat: matches 1. .364 of consensus" 
40410. .40651 

/note="MLTlD repeat: matches 218. .503 of consensus" 
40936. .41082 

/note="L2 repeat: matches 2560. .2708 of consensus" 
42602. .42792 

/note="MER5A repeat: matches 7. .187 of consensus" 
42845. .42940 

/note="L2 repeat: matches 2611. .2750 of consensus" 
42955. .43071 

/note="MER45 repeat: matches 10. .120 of consensus" 
43357. .43527 

/note="MER5A repeat: matches 1. .189 of consensus" 
44098. .44231 

/note="67 copies 2 mer aa 58% conserved" 
45604. .45714 

/note="LlPA13 repeat: matches 6047. .6156 of consensus" 
45760. .45911 

/note="76 copies 2 mer aa 57% conserved" 
47006. .47308 

/note="AluSx repeat: matches 1. .302 of consensus" 
47889. .47982 

/note="MER81 repeat: matches 20. .113 of consensus" 
48733. .48764 

/note=="MLTlJ repeat: matches 108. .139 of consensus" 
49237. .49306 

/note-"MER31-internal repeat: matches 4819. .4886 of 

consensus" 

49311. .49747 

/note="MER31B repeat: matches 1. .450 of consensus" 



Query Match 8.6%; Score 36.2; DB 9; Length 94555; 

Best Local Similarity 50.9%; Pred. No. 4.1; 

Matches 86; Conservative 0; Mismatches 83; Indels 0; Gaps 



0; 



Qy 216 cgaagctgtgctgcaagtatctgactggggtttcagcctatctgacatccaactgcagaa 275 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 2124 9 CAAATCAGAGAAGCTAGTGTCAGAAGAGGTTTTAGGAAGGTGAAGATTAGATGTGCAGGC 21190 

Qy 27 6 gaaagaggctcaaggcttttttgaactcatcacgtctctgttcaatcatgctgaaaaaca 335 

I I I I I I M Ml I II I I Mil I I I I I I I I II 

Db 21189 GGATGAGGCTAGAGCCAAATAAGCAGATTACTAGGTTATGTGTGCTTATGGGGAAAGGCA 21130 



Qy 336 gtgggtgggatttctgggcccaatacatatatcgcaggggatagatgac 384 

I I I I III I II I I I I I I I I I I I I II I II 
Db 2112 9 GTGGCAGGAAAATTGGGTGCCAAGAGATAAATGCCCAGCCATTGGCAAC 21081 



RESULT 13 
AC024632/C 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 



TITLE 
JOURNAL 

COMMENT 



AC024632 168438 bp DNA HTG 03-MAR-2000 

Homo sapiens chromosome 6 clone RP11-437J19 map 6, WORKING DRAFT 
SEQUENCE, 25 unordered pieces. 
AC024632 

AC024632.1 GI:7139757 

HTG; HTGS_PHASE1; HTGSJDRAFT . 

human . 

Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 168438) 

Birren,B., Linton, L., Nusbaum,C. and Lander, E. 
Homo sapiens chromosome 6, clone RP11-437J19 
Unpublished 

2 (bases 1 to 168438) 

Birren,B., Linton, L., Nusbaum,C, Lander, E., Abraham, H., Allen, N. 
Anderson, S., Baldwin, J., Barna,N., Bastien,V., Beda,F., 
Boguslavkiy, L. , Boukhgalter, B. , Brown, A. , Burkett,G. , 
Campopiano, A. , Castle, A., Choepel,Y., Colangelo, M . , Collins, S., 
Collymore, A. , Cooke, P., DeArellano, K . , Dewar,K., Diaz,J.S., 
Dodge, S., Domino, M., Doyle, M., Ferreira,P., FitzHugh,W., Gage,D., 
Galagan,J., Gardyna,S., Ginde,S., Goyette,M., Graham, L., 
Grand-Pierre, N. , Grant, G., Hagos,B., Heaford,A., Horton,L., 
Howland, J.C. , Iliev,I., Johnson, R., Jones, C, Kann,L., Karatas,A. 
Klein, J., LaRocque,K., Lamazares , R. , Landers,T., Lehoczky,J., 
Levine,R., Lieu,C, Liu, G . , Locke, K., Macdonald, P . , Marquis, N., 
McCarthy, M., McEwan,P., McGurk, A. , McKernan,K., McPheeters , R . , 
Meldrim,J., Meneus,L., Mihova,T., Miranda, C, Mlenga,V., Morrow, J 
Murphy, T., Naylor,J., Norman, C.H., O'Connor, T., 0 ' Donnell , P . , 
0'Neil,D., 01ivar,T.M., Oliver, J., Peterson, K., Pierre, N., 
Pisani,C, Pollara,V., Raymond, C, Riley, R., Rogov,P., Rothman,D. 
Roy, A., Santos, R., Schauer,S., Severy,P., Spencer, B., 
Stange-Thomann,N. , Sto j anovic, N . , Subramanian, A . , Talamas, J. , 
Tesfaye,S., Theodore, J., Tirrell,A., Travers,M., Trigilio,J., 
Vassiliev, H. , Viel,R., Vo,A., Wilson, B., Wu,X., Wyman,D., Ye, W.J. 
Young, G., Zainoun,J., Zimmer,A. and Zody,M. 
Direct Submission 

Submitted ( 01-MAR-2000 ) Whitehead Institute/MIT Center for Genome 
Research, 320 Charles Street, Cambridge, MA 02141, USA 
All repeats were identified using RepeatMasker : 
Smit, A.F.A. & Green, P. (1996-1997) 

http : //ftp . genome . Washington . edu/RM/RepeatMasker . html 
Genome Center 

Center: Whitehead Institute/ MIT Center for Genome Research 

Center code: WIBR 

Web site: http://www-seq.wi.mit.edu 

Contact : sequence_submissions@genome . wi . mit . edu 

Project Information 

Center project name: L6062 
Center clone name: 4 37_J_19 

Summary Statistics 

Sequencing vector: M13; M77815; 100% of reads 
Chemistry: Dye-terminator Big Dye; 100% of reads 



Assembly program: Phrap; version 0.960731 
Consensus quality: 154266 bases at least Q40 
Consensus quality: 161752 bases at least Q30 
Consensus quality: 164265 bases at least Q20 
Insert size: 180000; agarose-fp 
Insert size: 166038; sum-of-contigs 
Quality coverage: 3.6 in Q20 bases; agarose-fp 
Quality coverage: 3.9 in Q20 bases; sum-of-contigs 



* NOTE: This is "a f working draft 1 sequence. It currently 

* consists of 25 contigs. The true order of the pieces 

* is not known and their order in this sequence record is 

* arbitrary. Gaps between the contigs are represented as 

* runs of N, but the exact sizes of the gaps are unknown. 

* This record will be updated with the finished sequence 

* as soon as it is available and the accession number will 



* be 


preserved . 










1 


1505: contig 


of 1505 bp 


in 


length 


•k 


1506 


1605: gap or 


100 bp 








1606 


3527 : contig 


or ±311 Dp 


in 


lengt n 


•k 


3528 


3627 : gap of 


100 bp 






•k 


3628 


5122: contig 


of 1495 bp 


in 


length 


•k 


5123 


5222: gap of 


100 bp 






•k 


5223 


8133: contig 


of 2911 bp 


in 


length 


•k 


8134 


8233: gap of 


100 bp 






•k 


8234 


9885: contig 


of 1652 bp 


in 


length 


•k 


9886 


9985: gap of 


100 bp 






•k 


9986 


12052: contig 


of 2067 bp 


in 


length 




12053 


12152: gap of 


100 bp 








12153 


14403: contig 


of 2251 bp 


in 


length 


•k 


14404 


14503: gap of 


100 bp 






k 


14504 


18869: contig 


of 4366 bp 


in 


length 


k 


18870 


18969: gap of 


100 bp 






k 


18970 


21983: contig 


of 3014 bp 


in 


length 


-k 


21984 


22083: gap of 


100 bp 






* 


22084 


26566: contig 


of 4483 bp 


in 


length 




26567 


26666: gap of 


100 bp 








26667 


30100: contig 


of 3434 bp 


in 


length 




30101 


30200: gap of 


100 bp 






* 


30201 


33413: contig 


of 3213 bp 


in 


length 




33414 


33513: gap of 


100 bp 








33514 


36961: contig 


of 3448 bp 


in 


length 


★ 


36962 


37061: gap of 


100 bp 






* 


37062 


42449: contig 


of 5388 bp 


in 


length 


* 


42450 


42549: gap of 


100 bp 






* 


42550 


47375: contig 


of 4826 bp 


in 


length 


* 


47376 


47475: gap of 


100 bp 








47476 


51908: contig 


of 4433 bp 


in 


length 




51909 


52008: gap of 


100 bp 








52009 


57797: contig 


of 5789 bp 


in 


length 




57798 


57897: gap of 


100 bp 








57898 


64548: contig 


of 6651 bp 


in 


length 


★ 


64549 


64648: gap of 


100 bp 






k 


64649 


72958: contig 


of 8310 bp 


in 


length 


k 


72959 


73058: gap of 


100 bp 






k 


73059 


85343: contig 


of 12285 bp in length 


k 


85344 


85443: gap of 


100 bp 







J 



* 85444 97769: contig of 12326 bp 

* 97770 97869: gap of 100 bp 

* 97870 108295: contig of 10426 bp 

* 108296 108395: gap of 100 bp 

* 108396 123953: contig of 15558 bp 

* 123954 124053: gap of 100 bp 

* 124054 142803: contig of 18750 bp 

* 142804 142903: gap of 100 bp 

* 142904 168438: contig of 25535 bp 
FEATURES Location/Qualifiers 

source 1. .168438 

/organism="Homo sapiens" 
/db_xref="taxon:9606" 
/chromosome=" 6" 
/map="6" 

/clone="RPll-437J19" 

/clone lib="RPCI-ll Human Male 



in length 
in length 
in length 
in length 
in length, 



BAG" 



misc 


feature 


1. .1505 








/ note— ' 


' as sembly 


fragment" 


misc 


feature 


1606. 


.3527 








/note=' 


'assembly 


fragment" 


misc 


feature 


3628. 


.5122 








/note=' 


'assembly 


fragment" 


misc 


_f eature 


5223. 


.8133 








/note=' 


'assembly 


fragment" 


misc 


feature 


8234. 


.9885 








/note=' 


'assembly 


fragment" 


misc 


feature 


9986. 


. 12052 








/note=' 


'assembly 


fragment" 


misc 


feature 


12153. 


.14403 








/note- 1 


'assembly 


fragment" 


misc_ 


_f eature 


14504 . 


.18869 








/note=' 


'assembly 


fragment" 


mi s c 


feature 


18970. 


.21983 








/note=' 


'assembly 


fragment" 


misc_ 


_f eature 


22084 . 


.26566 








/note=' 


'assembly 


fragment" 


misc 


feature 


26667. 


.30100 








/note=' 


'assembly 


fragment" 


misc 


feature 


30201. 


.33413 








/note=' 


'assembly 


fragment" 


misc 


feature 


33514 . 


.36961 








/note=' 


'assembly_ 


fragment 






clone end:T7 








vector 


side : right" 


misc_ 


_f eature 


37062." 


.42449 








/note=' 


'assembly 


fragment" 


misc 


feature 


42550. 


.47375 








/note=' 


'assembly 


fragment" 


misc 


feature 


47476. 


.51908 








/note=' 


'assembly 


fragment" 


misc 


feature 


52009. 


.57797 








/note=' 


'assembly 


fragment" 


misc 


_f eature 


57898. 


. 64548 








/note= 


'assembly 


fragment" 


misc_ 


_f eature 


64649. 


.72958 








/note= 


r assembly_ 


fragment" 



misc_feature 73059. .85343 

/not e=" as sembly_f ragment" 
misc__feature 85444. .97769 

/not e= fl as sembly_f ragment " 
misc_feature 97870. .108295 

/note=" as sembly_f ragment " 
misc_feature 108396. .123953 

/note= "a ssembly_f ragment " 
misc_feature 124054. .142803 

/note= " as sembly_f ragment 

clone_end : SPG 

vector^side : right" 
misc_feature 142904. .168438 

/not e=" as sembly_f ragment " 
BASE COUNT 52247 a 30934 c 30484 g 52372 t 2401 others 
ORIGIN 



Query Match 8.6%; Score 36.2; DB 2; Length 168438; 

Best Local Similarity 50.9%; Pred. No. 4.3; 

Matches 86; Conservative 0; Mismatches . 83; Indels 0; Gaps 0; 

Qy 216 cgaagctgtgctgcaagtatctgactggggtttcagcctatctgacatccaactgcagaa 27 5 

I II I I Ml INN II I I I I I I I I I M I I I 

Db 1614 56 CAAATCAGAGAAGCTAGTGTCAGAAGAGGTTTTAGGAAGGTGAAGATTAGATGTGCAGGC 161397 

Qy 27 6 gaaagaggctcaaggcttttttgaactcatcacgtctctgttcaatcatgctgaaaaaca 335 

I I I I I I I I III I II I 1 I I I I I I I I I I 1 I M 

Db 161396 GGATGAGGCTAGAGCCAAATAAGCAGATTACTAGGTTATGTGTGCTTATGGGGAAAGGCA 161337 

Qy 336 gtgggtgggatttctgggcccaatacatatatcgcaggggatagatgac 384 

I I I I II I I II I II I I I I I I I I I III II 
Db 161336 GTGGCAGGAAAATTGGGTGCCAAGAGATAAATGCCCAGCCATTGGCAAC 161288 



RESULT 14 

AF137266 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 

TITLE 

JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 



AF137266 824 bp DNA PLN 09-JAN-2001 

Nuphar lutea CT dinucleotide repeat microsatellite sequence. 
AF137266 

AF137266."l GI:5733431 

Nuphar lutea. 
Nuphar lutea 

Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta ; 
Spermatophyta; Magnoliophyta; Nymphaeaceae; Nuphar . 

1 (bases 1 to 824) 

Ouborg, N.J. , Goodall-Copestake, W . P . , Saumitou-Laprade , P. , Bonnin, I 
and Epplen, J. T . 

Novel polymorphic microsatellite loci isolated from the yellow 

waterlily, Nuphar lutea 

Mol. Ecol. 9 (4), 497-498 (2000) 

20200292 

10736057 

2 (bases 1 to 824) 

Ouborg, N . J. , Goodall-Copestake, W . P . , Saumitou-Laprade, P . and 
Epplen, J. T. 



TITLE 
JOURNAL 

FEATURES 

source 



repeat_region 



BASE COUNT 
ORIGIN 



Direct Submission 

Submitted (25-MAR-1999) Dept. of Ecology, Univ 
Toernooiveld 1, Nijmegen 6525 ED, Netherlands 

Location/Qualifiers 

1. .824 

/organism-"Nuphar lutea" 
/db_xr e f = " t axon : 7 7 1 1 3 " 
271. .316 

/note="microsatellite" 
/rpt_type=tandem 
/rpt_unit=CT 
223 a 193 c 163 g 228 t 



of Nijmegen, 



17 others 



Query Match 8.6%; Score 36; DB 8; Length 824; 

Best Local Similarity 64.3%; Pred. No. 2.9; 

Matches 54; Conservative 0; Mismatches 30; Indels 



0; Gaps 



0; 



Qy 260 acatccaactgcagaagaaagaggctcaaggcttttttgaactcatcacgtctctgttca 319 

I 1 I I I I I I I I I I I I I I I I I I I I I I II I I I II I I I I I I I I 
Db 34 5 ACATCCAACGGTAGATATCAGAGGCTCAAGCCTGATGAAAAATCGTCACGGCCAAGAAGA 4 04 

Qy 320 atcatgctgaaaaacagtgggtgg 343 

I II I I I M I I I I I I I 
Db 4 05 AGATGGCTGAGAATCAAAGGGAGG 4 28 



RESULT 15 

AC010628 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 



AC010628 101261 bp DNA HTG 18-JUL-2000 

Homo sapiens chromosome 5 clone CTD-2180L11, WORKING DRAFT 
SEQUENCE, 16 ordered pieces. 
AC010628 

AC010628.4 GI:9256262 

HTG; HTGS_PHASE2; HTGSJDRAFT . 

human . 

Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 101261) 

DOE Joint Genome Institute. 
Sequencing of Human Chromosome 5 
Unpublished 

2 (bases 1 to 101261) 

DOE Joint Genome Institute. 
Direct Submission 

Submitted ( 1 6-SEP-1999 ) Production Sequencing Facility, DOE Joint 
Genome' Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA 
On Jul 18, 2000 this sequence version replaced gi:7710608. 

Genome Center 

Center: Joint Genome Institute 
Center Code: JGI 

Web site: http://www.jgi.doe.gov 



Project Information 

Center Project Name: 696944, H468 



Center clone- name: CITB-H1 2180L11 



Summary Statistics 

Consensus quality: 76583 bases at least Q40 
Consensus quality: 90678 bases at least Q30 
Consensus quality: 96052 bases at least Q20 
Estimated insert size: 112000; pulse field gel estimation 
size: 100561; sum-of-contigs estimation 

3.95 in Q20 bases; pulse field gel estimation 
; 4 . 4 in Q20 bases; sum-of-contigs estimation, 
i 'working draft 1 sequence. It currently 



Estimated insert 
Quality coverage: 
Quality coverage: 
* NOTE: This is c 



FEATURES 

source 



consists of 16 contigs. Gaps between the contigs 

are represented as runs of N. The order of the pieces 

is believed to be correct as given, however the sizes 

of the gaps between them are based on estimates that have 

provided by the submittor. 

This sequence will be replaced 

by the finished sequence as soon as it is available and 
the accession number will be preserved. 





1 


234 9: 


contig 


of 2349 


bp in length 




2350 


24 49: 


gap of 


unknown 


length 


t*t 


2450 


9244 : 


contig 


of 6795 


bp in length 


■k 


9245 


9344 : 


. gap of 


unknown 


length 


•k 


9345 


12209 : 


contig 


of 2865 


bp in length 


•k 


12210 


12309 * 


gap of 


unknown 


length 


■k 


12310 


14574- 


contig 


of 2265 


bp in length 


■k 


14575 


14674 


gap of 


unknown 


length 




14675 


16514 


contig 


of 1840 


bp in length 


•k 


16515 


16614 


gap of 


unknown 


length 


■k 


16615 


17989 


contig 


of 1375 


bp in length 


■k 


17990 


18089 


gap of 


unknown 


length 


* 


18090 


37668 


contig 


of 19579 bp in length 




37669 


37768 


gap of 


unknown 


length 




37769 


41885 


contig 


of 4117 


bp in length 




41886 


41985 


gap of 


unknown 


length 


* 


41986 


51375 


contig 


of 9390 


bp in length 


★ 


51376 


51475 


gap of 


unknown 


length 


* 


51476 


54268 


contig 


of 2793 


bp in length 


* 


54269 


54368 


gap of 


unknown 


length 




54369 


55918 


contig 


of 1550 


bp in length 


* 


55919 


56018 


gap of 


unknown 


length 


•k 


56019 


58548 


contig 


of 2530 


bp in length 


* 


58549 


58648 


gap of 


unknown 


length 


* 


58649 


64654 


: contig 


of 6006 


bp in length 




64655 


64754 


: gap of 


unknown 


length 




64755 


76433 


: contig 


of 11679 bp in length 




76434 


76533 


: gap of 


unknown 


length 


+ 


76534 


82850 


: contig 


of 6317 


bp in length 


■k 


82851 


82950 


: gap of 


unknown 


length 


■k 


82951 


101261 


: contig 


of 18311 bp in length 




Location/Qualifiers 






1. 


.101261 










/organism="Homo sapiens" 






/db 


xref="taxon: 9606" 





/chromosome="5" 
/clone="CTD-2180Lll" 

/clone lib="CalTech human BAC library D" 



BASE COUNT 29116 a 17494 c 19600 g 33526 t 1525 others 
ORIGIN 



Query Match 8.5%; Score 35.6; DB 2; Length 101261; 

Best Local Similarity 54.6%; Pred. No. 6.3; 

Matches 71; Conservative 0; Mismatches 59; Indels 0; Gaps 

Qy 28 ttagcttaacaattcttagtagtcaccccttcgattaaatgtcaacatttgccttttcgc 87 

I 1 I I II II I I I I I I II I I II I I I I I I III 

Db 40727 TTTGCATAGAAAAATTTATTATTTTGTATATCTCATAAATACCTAAATCTCATTTTCAAA 40786 

Qy 8 8 gttccaattactaatgttacggcattattcaggacagaactttactggaacgtcctgtgt 14 7 

I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I 

Db 407 87 ATTCTATTTGTT7VATTTTACT7VAATGATGATATACAGTATTGTACTTTAAGACCCTGATT 4 0846 

Qy 148 tcaatgcatt 157 

I III III 
Db 40847 CCTATGAATT 40856 



Search completed: February 7, 2002, 10:55:05 
Job time: 9231 sec 

GenCore version 4.5 
Copyright (c) 1993 - 2000 Compugen Ltd. 



OM nucleic - nucleic search, using sw model 

Run on: February 7, 2002, 10:59:33 ; Search time 428.31 Seconds 

(without alignments) 
842.693 Million cell updates/se 

Title: . US-09-394-745-5950 

Perfect score: 421 

Sequence: 1 gggtccaggcacgcgtccga agtggcagaatttgtgccgc 421 

Scoring table: IDENTITY_NUC 

Gapop 10.0 , Gapext 1.0 

Searched: 930621 seqs, 428662619 residues 

Total number of hits satisfying chosen parameters: 1861242 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 



Database : N_Geneseq_1101 : * 



/SIDS2/gcgdata/geneseq/geneseqn/NA1980 . DAT 
/SIDS2/gcgdata/geneseq/geneseqn/NA1981 . DAT 
/SIDS2/gcgdata/geneseq/geneseqn/NA1982 . DAT 



4 : /SIDS2/gcgdata/geneseq/ gene seqn/NAl 983 . DAT : * 

5 : /SIDS2/gcgdata/geneseq/geneseqn/NA1984 . DAT: * 

6 : /SIDS2/gcgdata/geneseq/ gene seqn/NAl 985 . DAT : * 

7 : /SIDS2/gcgdata/geneseq/geneseqn/NA1986 . DAT: * 

8 : /SIDS2/gcgdata/geneseq/geneseqn/NA1987 . DAT: * 

9: /SIDS2/gcgdata/geneseq/geneseqn/NA1988 . DAT: * 

10 : /SIDS2/gcgdata/geneseq/geneseqn/NA1989 . DAT : * 

11 : /SIDS2/gcgdata/geneseq/geneseqn/NA1990 . DAT : * 

12 : /SIDS2/gcgdata/geneseq/geneseqn/NA1991 . DAT : * 

13: /SIDS2/gcgdata/geneseq/geneseqn/NA1992 . DAT : * 

14 : /SIDS2/gcgdata/geneseq/geneseqn/NA1993 . DAT : * 

15 : /SIDS2/gcgdata/geneseq/geneseqn/NA1994 . DAT : * 

16: /SIDS2/gcgdata/geneseq/geneseqn/NA1995 .DAT: * 

17 : /SIDS2/gcgdata/geneseq/geneseqn/NA1996 . DAT : * 

18 : /SIDS2/gcgdata/geneseq/geneseqn/NA1997 . DAT : * 

19: /SIDS2/gcgdata/geneseq/geneseqn/NA1998 . DAT : * 

20 : /SIDS2/gcgdata/geneseq/geneseqn/NA1999 . DAT : * 

21 : /SIDS2/gcgdata/geneseq/geneseqn/NA2000 . DAT : * 

22 : /SIDS2/gcgdata/geneseq/geneseqn/NA2001 .DAT : * 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SUMMARIES 

% 



.esult 
No. 


Score 


Query 
Match 


Length 


DB 


ID 


Description 




1 


34.8 


8.3 


936 


22 


AAF58252 


Oligonucleotide Dl 




2 


34.8 


8.3 


936 


22 


AAF58254 


Oligonucleotide Dl 




3 


34 .8 


8.3 


936 


22 


AAF58257 


Oligonucleotide Dl 




4 


34.8 


8.3 


936 


22 


AAF58259 


Oligonucleotide D2 




5 


34.8 


8.3 


936 


22 


AAF58262 


Oligonucleotide D2 




6 


34.8 


8.3 


938 


22 


AAF58255 


Oligonucleotide Dl 


c 


7 


34.2 


8.1 


786 


22 


AAH07451 


Human cDNA clone ( 


c 


8 


34 .2 


8.1 


1753 


22 


AAH16513 


Human cDNA sequenc 


c 


9 


33.6 


8.0 


936 


22 


AAF58252 


Oligonucleotide Dl 


c 


10 


33.6 


8.0 


936 


22 


AAF58254 


Oligonucleotide Dl 


c 


11 


33.6 


8.0 


936 


22 


AAF58257 


Oligonucleotide Dl 


c 


12 


33. 6 


8.0 


936 


22 


AAF58259 


Oligonucleotide D2 


c 


13 


33.6 


8.0 


936 


22 


AAF58262 


Oligonucleotide D2 


c 


14 


33. 6 


8.0 


938 


22 


AAF58255 


Oligonucleotide Dl 




15 


33 


7.8 


752 


20 


AAX98756 


Human validated ca 


c 


16 


32.6 


7.7 


878 


22 


AAH07610 


Human cDNA clone ( 


c 


17 


32.6 


7.7 


2187 


22 


AAH14871 


Human cDNA sequenc 


c 


18 


32.6 


7.7 


7418 


22 


AAI58480 


Human polynucleoti 


c 


19 


31 


7,4 


2878 


15 


AAQ54482 


Excitatory amino a 


c 


20 


31 


7.4 


2878 


16 


AAQ91232 


Human EAA4 recepto 


c 


21 


31 


7.4 


2878 


22 


AAC62038 


cDNA encoding a un 


c 


22 


31 


7.4 


2878 


22 


AAC62041 


cDNA encoding a fo 


c 


23 


31 


7.4 


2878 


22 


AAC62042 


cDNA encoding a fo 


c 


24 


31 


7.4 


236303 


22 


AAS11614 


Human genomic DNA 




25 


30.6 


7.3 


547 


22 


AAF68173 


Human lung tumour 


c 


26 


30.6 


7.3 


1110 


21 


AAC45497 


Arabidopsis thalia 




27 


30.2 


7.2 


700 


22 


AAH92694 


Human inflammatory 




28 


30.2 


7.2 


3975 


21 


AAC51553 


Arabidopsis thalia 





zy 


■3 ri 


/ . 


. 1 


i ice 


O 1 

Z X 


AA/iU 1 y 0 D 


Human colon cancer 




*3 r\ 




/ . 


. l 


Id / o 


1 0 


*3 a c; 
pj\l y i juo 


nuiuan j o y fi i secre 


c 


jl 




o 


. i 


O /I OA 
Z4 UU 


1 H 


AA(J 4 Z 4 y D 


tun lengtn nuiuan 


c 


oZ 




n 

I . 






O 1 
Z 1 


AAiif! y q / u 


numan wi j.a type du 


c 


O O 




1 , 


. 1 


Z4 1 0 


O 1 
Z 1 


pj\6h y 4 / i 


Human Butyrylcholi 


c 


"3 /I 


OQ Q 

z y , o 


-7 


, 1 


"3 Q 1 

j y i 


0 Pi 

Z U 


PJ\6Z o U UZ 


NucieoLiae sequenc 


c 


"3 c: 
JJ 


z y . o 


1 , 


, 1 


"3 Q 1 

o y x 


O 1 
Z 1 


AA/\y ouz y 


nuiuan lecuomeuin i 


c 


"3 c 
OD 


z y . b 


I , 


. U 




ZU 


X 4 Z 3 


L-Oiripieu© genoine se 


c 


O / 


z y . 4 


1 , 


, U 


OCT 

y do 


ZZ 




nuiuan oxracL-oxry ire 


c 


jo 


on / 

z y . 4 


/ . 


. U 


"3 O *3 C 

jy jo 


Z 1 


7\ 7\ n Q Q A CO 

AAZo y 4 Do 


Murine trans— synap 


c 


o y 


z y . 4 


n 

I , 


, U 


1 o D4 y / b 


i y 


a auoi ono 
A/\VZ 1 z u y 


i v je unanococcub jann 




40 


29.2 


6, 


. 9 


1013 


21 


AAZ51231 


Staphylococcus aur 




41 


29.2 


6. 


.9 


8779 


18 


AAV74369 


Staphylococcus aur 


c 


42 


29.2 


6, 


.9 


89047 


22 


AAF28547 


Genomic fragment # 


c 


43 


29 


6, 


.9 


1112 


21 


AAC40621 


Arabidopsis thalia 




44 


29 


6. 


.9 


1544 


22 


AAH02939 


Human shear stress 




45 


29 


6, 


.9 


2204 


16 


AAQ87426 


Human GRK cDNA #2. 



ALIGNMENTS 



RESULT 1 
AAF58252 

ID AAF58252 standard; DNA; 936 BP. 
XX 

AC AAF58252; 
XX 

DT 24-APR-2001 (first entry) 
XX 

DE Oligonucleotide D1835. 
XX 

KW Electron-transfer group; ETM; mismatch; genotyping; 

KW gene expression; ss. 

XX 

OS Synthetic. 
XX 

PN WO200107665-A2 . 
XX 

PD 01-FEB-2001. 
XX 

PF 26-JUL-2000; 2000WO-US2047 6 . 
XX 

PR 26-JUL-1999; 99US-014 5695 . 

PR 17-MAR-2000; 2000US-0190259 . 
XX 

PA (CLIN-) CLINICAL MICRO SENSORS INC. 
XX 

PI Umek RM; 
XX 

DR WPI; 2001-159728/16. 
XX 

PT Nucleic acids containing electron-transfer group, useful as labels in 

PT hybridization assays, e.g. for genotyping, allowing repeat analyses on 

PT a single surface 
XX 

PS Example 6; Page 127; 159pp; English. 



XX 

CC The present invention relates to a composition comprising two nucleic 

CC acids each containing an electron-transfer group (ETM) having 

CC different redox potentials. The invention is used for electronic 

CC detection of nucleic acids, especially of substitutions (mismatches) 

CC and single-nucleotide polymorphisms, e.g. for genotyping, 

CC monitoring gene expression. 

XX 

SQ Sequence 936 BP; 4 A; 139 C; 10 G; 7 T; 776 other; 

Query Match 8.3%; Score 34.8; DB 22; Length 936; 

Best Local Similarity 1.7%; Pred. No. 0.18; 



Matches 


6; Conservative 195; Mismatches 147; Indels 0; Gaps 






ddl LUaUU L la^u L. L. a a o a d i_ t_ o u Lay u a y llqulll- l. i_ y ci u u a. a. ci *— y i_ o a a. ^ o u- u Ly <w 


79 


Db 


192 


wwwwgwwwwwwwwwwwwwwwwwwwwwwww 


251 


Qy 


80 


cttttcgcgttccaattactaatgttacggcat tat tcaggacagaactt tact ggaacg 


139 


Db 


252 


wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwcwwwwwwwwwwwwwwwww 


311 


Qy 


140 


tcctgtgttcaatgcattctgggaaaggaatgttgcagagtctgtgcagccaaggagatg 


199 


Db 


312 


wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww 


371 


Qy 


200 


caaggccatttgtggacgaagctgtgctgcaagtatctgactggggtttcagcctatctg 


259 


Db 


372 


wwwwwwwwwwwwwwwwwwwwgc 1 1 a wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww 


431 


Qy 


260 


acatccaactgcagaagaaagaggctcaaggcttttttgaactcatcacgtctctgttca 


319 


Db 


432 


wwwwwwwwwwwwwwwwwwwwwwwwwwwww 


491 


Qy 


320 


atcatgctgaaaaacagtgggtgggatttctgggcccaatacatatat 367 




Db 


492 


wwcwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww 539 





RESULT 2 


AAF5I 


3254 


ID 


AAF58254 standard; DNA; 936 BP. 


XX 




AC 


AAF58254; 


XX 




DT 


24-APR-2001 (first entry) 


XX 




DE 


Oligonucleotide D1875. 


XX 




KW 


Electron-transfer group; ETM; mismatch; genotyping; 


KW 


gene expression; ss . 


XX 




OS 


Synthetic . 


XX 




PN 


WO200107665-A2. 


XX 





PD 01-FEB-2001. 
XX 

PF 26-JUL-2000; 2000WO-US20476 . 
XX 

PR 26-JUL-1999; 99US-0145695 . 

PR 17-MAR-2000; 2000US-01 90259 . 
XX 

PA (CLIN- ) CLINICAL MICRO SENSORS INC. 
XX 

PI Umek RM; 
XX 

DR WPI; 2001-159728/16. 
XX 

PT Nucleic acids containing electron-transfer group, useful as labels in 

PT hybridization assays, e.g. for genotyping, allowing repeat analyses on 

PT a single surface 
XX 

PS Example 6; Page 127; 159pp; English. 
XX 

CC The present invention relates to a composition comprising two nucleic 

CC acids each containing an electron-transfer group (ETM) having 

CC different redox potentials. The invention is used for electronic 

CC detection of nucleic acids, especially of substitutions (mismatches) 

CC and single-nucleotide polymorphisms, e.g. for genotyping, 

CC monitoring gene expression. 

XX 

SQ Sequence 936 BP; 4 A; 144 C; 7 G; 5 T; 776 other; 



Query Match 8.3%; Score 34.8; DB 22; Length 936; 

Best Local Similarity 1.7%; Pred. No. 0.18; 

Matches 6; Conservative 195; Mismatches 147; Indels 0; Gaps 

Qy 20 aattgaggttagcttaacaattcttagtagtcaccccttcgattaaatgtcaacatttgc 79 

:■: : : I : : : : : : : : : : : : : : : : : : : : : ::::::: : : : : : : : 
Db 192 wwwwgwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww 251 

Qy 80 cttttcgcgttccaattactaatgttacggcattattcaggacagaactttactggaacg 139 

Db 252 wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwcwwwwwwwwwwwwwwwww 311 

Qy 140 tcctgtgttcaatgcattctgggaaaggaatgttgcagagtctgtgcagccaaggagatg 199 

Db 312 wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww 371 

Qy 200 caaggccatttgtggacgaagctgtgctgcaagtatctgactggggtttcagcctatctg 259 

Db 372 wwwwwwwwwwwwwwwwwwwwgcttawwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww 4 31 

Qy 2 60 acatccaactgcagaagaaagaggctcaaggcttttttgaactcatcacgtctctgttca 319 

Db 432 wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww 4 91 

Qy 320 atcatgctgaaaaacagtgggtgggatttctgggcccaatacatatat 367 

Db 4 92 wwcwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww 53 9 



RESULT 3 


AAF58257 


t n 
x u 


B. B. TT 1 ^ P. 9 ^7 cfanH^rH' DMA • Q^fi RP 
nn£ OOZ J./ S LailClalUf Ul\r\ f 7jD otr. 


v V 
AA 






z\ zvfs 8 ? r 7 • 


w 
AA 




U i 


OA ADD OAH1 / f i v- o +- an-H rw^ 

ArK- zuui ( rirsL entryj 


V V 
AA 






ui lgonucieot iae Diyo^. 


XX 




KW 


tiiectron-transrer group, liiyi, mismaLcn, genoi-ypxng, 


KW 


gene expression; ss. 


XX 




Ob 


Synthetic . 


XX 




r N 


1/vUz.UUlU / ODD AZ . 


vv 
AA 




r D 


U 1 r & o Z U U 1 . 


w 

AA 




PF 


ZD - JUL— ZUUU; / UU U WU- UbZ U H f 0 . 


XX 




PR 


2o-JUL-ly99; y yus-ui4ooyo . 


DD 

rK 


1 i-MSD-onnn • 9nnnn<s_ni qhdrq 
l / -nAK-zUUU/ zuuuuo ui yuzoy . 


XX 




PA 


(CLIN-) CLINICAL MICRO bLNoURo INL. 


v v 

AA 




PI 


Ume k RM ; 


XX 




DR 


WPI; zUUl-lby/^o/lb. 


XX 




PT 


Nucleic acids containing electron-transfer group, useful as labels in 


r 1 


nyDriQlZaLlOn. assays, e.g. 3_UX yeilULypxiiy / aliuwiny icpcaL anaiyoco wii 


r i 


a single surface — 


XX 




no 
ro 


HiXaiupxe u, rage xz / , i j jpp, iLiiyxxoii. 


XX 




cc 


The present invention relates to a composition comprising two nucleic 


cc 


acids each containing an electron-transfer group (ETM) having 


cc 


different redox potentials. The invention is used for electronic 


cc 


detection of nucleic acids, especially of substitutions (mismatches) 


cc 


and single-nucleotide polymorphisms, e.g. for genotyping, 


cc 


monitoring gene expression. 


XX 




SQ 


Sequence 936 BP; 5 A; 142 C; 7 G; 6 T; 776 other; 



Query Match 8.3%; Score 34.8; DB 22; Length 936; 

Best Local Similarity 1.7%; Pred. No. 0.18; 

Matches 6; Conservative 195; Mismatches 147; Indels 0; Gaps 0; 
Qy 20 aattgaggttagcttaacaattcttagtagtcaccccttcgattaaatgtcaacatttgc 7 9 



Db 192 wwwwgwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww 251 



Qy 80 cttttcgcgttccaattactaatgttacggcattattcaggacagaactttactggaacg 139 



Db 

Qy 



252 
140 



wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwcwwwwwwwwwwwwwwwww 
tcctgtgttcaatgcattctgggaaaggaatgttgcagagtctgtgcagccaaggagatg 



311 
199 



Db 


312 


wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww 


371 


Qy 


200 


caaggccatttgtggacgaagctgtgctgcaagtatctgactggggtttcagcctatctg 
wwwwwwwwwwwwwwwwwwwwgcttawwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww 


o c n 


Db 


372 


431 


Qy 


260 


acatccaactgcagaagaaagaggctcaaggcttttttgaactcatcacgtctctgttca 


319 


Db 


4 32 


wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww 


491 


Qy 


320 


atcatgctgaaaaacagtgggtgggatttctgggcccaatacatatat 367 




Db 


492 


wwcwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww 539 





RESULT 4 


AAF58259 


ID 


AAF58259 standard; DNA; 936 BP. 


XX 




AC 


AAFoozby ; 


XX 




DT 


24-APR-2001 (first entry) 


XX 




DE 


Oligonucleotide D2004. 


XX 




KW 


Electron-transfer group; ETM; mismatch; genotyping; 


KW 


gene expression; ss. 


XX 




OS 


Synthetic . 


XX 




PN 


WO200107665-A2. 


XX 




PD 


01-FEB-2001 . 


XX 




PF 


26-JUL-2000; 2000WO-US204 7 6 . 


XX 




PR 


2 6-JUL-1999; 99US-0145695 . 


PR 


17-MAR-2000; 2000US-01 90259 . 


XX 




PA 


(CLIN-) CLINICAL MICRO SENSORS INC. 


XX 




PI 


Umek RM; 


XX 




DR 


WPI; 2001-159728/16. 


XX 




PT 


Nucleic acids containing electron-transfer group, useful as labels in 


PT 


hybridization assays, e.g. for genotyping, allowing repeat analyses on 


PT 


a single surface 


XX 




PS 


Example 6; Page 128; 159pp; English. 


XX 




cc 


The present invention relates to a composition comprising two nucleic 


cc 


acids each containing an electron-transfer group (ETM) having 



CC different redox potentials. The invention is used for electronic 

CC detection of nucleic acids, especially of substitutions (mismatches) 

CC and single-nucleotide polymorphisms, e.g. for genotyping, 

CC monitoring gene expression. 

XX 

SQ Sequence 936 BP; 6 A; 138 C; 8 G; 8 T; 776 other; 

Query Match 8.3%; Score 34.8; DB 22; Length 936; 

Best Local Similarity 1.7%; Pred. No. 0.18; 

Matches 6; Conservative 195; Mismatches 147; Indels 0; Gaps 

Qy 20 aattgaggttagcttaacaattcttagtagtcaccccttcgattaaatgtcaacatttgc 79 

: : : : I : : : : ::::::::::::::: : : :::::::::::::: 
Db 192 wwwwgwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww 251 

Qy 80 cttttcgcgttccaattactaatgttacggcattattcaggacagaactttactggaacg 139 

Db 252 wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwcwwwwwwwwwwwwwwwww 311 

Qy 14 0 tcctgtgttcaatgcattctgggaaaggaatgttgcagagtctgtgcagccaaggagatg 199 

Db 312 wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww 371 

Qy 200 caaggccatttgtggacgaagctgtgctgcaagtatctgactggggtttcagcctatctg 2 59 

Db 372 wwwwwwwwwwwwwwwwwwwwgcttawwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww 4 31 

Qy 2 60 acatccaactgcagaagaaagaggctcaaggcttttttgaactcatcacgtctctgttca 319 

Db 4 32 wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww 4 91 

Qy 320 atcatgctgaaaaacagtgggtgggatttctgggcccaatacatatat 367 

Db 4 92 wwcwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww 539 



RESULT 5 
AAF58262 

ID AAF58262 standard; DNA; 936 BP. 
XX 

AC AAF58262; 
XX 

DT 24-APR-2001 (first entry) 
XX 

DE Oligonucleotide D2007. 
XX 

KW Electron-transfer group; ETM; mismatch; genotyping; 

KW gene expression; ss. 

XX 

OS Synthetic. 
XX 

PN WO200107665-A2. 
XX 

PD 01-FEB-2001. 
XX 

PF 26-JUL-2000; 2000WO-US2047 6 . 



XX 

PR 26-JUL-1999; 99US-0145695 . 

PR 17-MAR-2000; 2000US-01 90259 . 
XX 

PA (CLIN-) CLINICAL MICRO SENSORS INC. 
XX 

PI Umek RM; 
XX 

DR WPI; 2001-159728/16. 
XX 

PT Nucleic acids containing electron-transfer group, useful as labels in 

PT hybridization assays, e.g. for genotyping, allowing repeat analyses on 

PT a single surface 
XX 

PS Example 6; Page 128; 159pp; English. 
XX 

CC The present invention relates to a composition comprising two nucleic 

CC acids each containing an electron-transfer group (ETM) having 

CC different' redox potentials. The invention is used for electronic 

CC detection of nucleic acids, especially of substitutions (mismatches) 

CC and single-nucleotide polymorphisms, e.g. for genotyping, 

CC monitoring gene expression. 

XX 

SQ Sequence 936 BP; 5 A; 139 C; 10 G; 6 T; 776 other; 



Query Match 8.3%; Score 34.8; DB 22; Length 936; 

Best Local Similarity 1.7%; Pred. No. 0.18; 



Matches 


6; Conservative 195; Mismatches 147; Indels 0; Gaps 


Qy 


20 


aattgaggttagcttaacaattcttagtagtcaccccttcgattaaatgtcaacatttgc 
: : : : | : : : : ::::::::::::::: : : :::::::::::::: 


79 


Db 


192 


wwwwgwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww 


251 


Qy 


80 


cttttcgcgttccaattactaatgttacggcattattcaggacagaactttactggaacg 


139 


Db 


252 


wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwcwwwwwwwwwwwwwwwww 


311 


Qy 


140 


tcctgtgttcaatgcattctgggaaaggaatgttgcagagtctgtgcagccaaggagatg 


199 


Db 


312 


wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww 


371 


Qy 


200 


caaggccatttgtggacgaagctgtgctgcaagtatctgactggggtttcagcctatctg 


259 


Db 


372 


wwwwwwwwwwwwwwwwwwwwgcttawwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww 


431 


Qy 


260 


acatccaactgcagaagaaagaggctcaaggcttttttgaactcatcacgtctctgttca 


319 


Db 


432 


wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww 


491 


Qy 


320 


atcatgctgaaaaacagtgggtgggatttctgggcccaatacatatat 367 




Db 


492 


wwc wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww 539 





RESULT 6 
AAF58255 



ID AAF58255 standard; DNA; 938 BP. 
XX 

AC AAF58255; 
XX 

DT 24-APR-2001 (first entry) 
XX 

DE Oligonucleotide D1876. 
XX 

KW Electron-transfer group; ETM; mismatch; genotyping; 

KW gene expression; ss. 

XX 

OS Synthetic. 
XX 

PN WO200107665-A2. 
XX 

PD 01-FEB-2001. 

XX 

PF 26-JUL-2000; 2000WO-US204 7 6 . 
XX 

PR 26-JUL-1999; 99US-0145695 . 

PR 17-MAR-2000; 2000US-01 90259 . 
XX 

PA (CLIN-) CLINICAL MICRO SENSORS INC. 
XX 

PI Umek RM; 
XX 

DR WPI; 2001-159728/16. 
XX 

PT Nucleic acids containing electron-transfer group, useful as labels in 

PT hybridization assays, e.g. for genotyping, allowing repeat analyses on 

PT a single surface 
XX 

PS Example 6; Page 127; 159pp; English. 
XX 

CC The present invention relates to a composition comprising two nucleic 

CC acids each containing an electron-transfer group (ETM) having 

CC different redox potentials. The invention is used for electronic 

CC detection of nucleic acids, especially of substitutions (mismatches) 

CC and single-nucleot ide polymorphisms, e.g. for genotyping, 

CC monitoring gene expression. 

XX 

SQ Sequence 938 BP; 4 A; 144 C; 9 G; 5 T; 776 other; 



Query Match 8.3%; Score 34.8; DB 22; Length 938; 

Best Local Similarity 1.7%; Pred. No. 0.18; 

Matches 6; Conservative 195; Mismatches 147; Indels 0; Gaps 

Qy 20 aattgaggttagcttaacaattcttagtagtcaccccttcgattaaatgtcaacatttgc 7 9 

: : : : I : : : : ::::::::::::::: : : :::::::::::::: 
Db 192 wwwwgwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww 251 

Qy 80 cttttcgcgttccaattactaatgttacggcattattcaggacagaactttactggaacg 139 

Db 252 wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwcwwwwwwwwwwwwwwwww 311 



Qy 14 0 tcctgtgttcaatgcattctgggaaaggaatgttgcagagtctgtgcagccaaggagatg 199 



Db 



312 wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww 371 



Qy 



200 caaggccatttgtggacgaagctgtgctgcaagtatctgactggggtttcagcctatctg 259 



Db 



372 wwwwwwwwwwwwwwwwwwwwgc 1 1 awwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww 4 31 



Qy 



260 acatccaactgcagaagaaagaggctcaaggcttttttgaactcatcacgtctctgttca 319 



Db 



4 32 wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww 4 91 



Qy 



Db 



320 atcatgctgaaaaacagtgggtgggatttctgggcccaatacatatat 367 
4 92 wwcwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww 539 



RESULT 7 
AAH07451/C 

ID AAH07451 standard; cDNA; 786 BP. 
XX 

AC AAH07451; 
XX 

DT 26-JUN-2001 (first entry) 
XX 

DE Human cDNA clone (5'-primer) SEQ ID NO: 4286. 
XX 

KW Human; primer; detection; diagnosis; antisense therapy; gene therapy; 
XX 

OS Homo sapiens. 
XX 

PN EP1074617-A2. 
XX 

PD 07-FEB-2001. 
XX 

PF 28-JUL-2000; 2000EP-01 1 6126 . 
XX 

PR 29-JUL-1999; 99JP-0248036 . 

PR 27-AUG-1999; 99JP-0300253 . 

PR ll-JAN-2000; 2000 JP-011877 6 . 

PR 02-MAY-2000; 2000 JP-01837 67 . 

PR 09-JUN-2000; 2000JP-0241899 . 
XX 

PA (HELI-) HELIX RES INST. 
XX 

PI Ota T, Isogai T, Nishikawa T, Hayashi K, Saito K, Yamamoto J; 

PI Ishii S, Sugiyama T, Wakamatsu A, Nagai K, Otsuki T; 

XX 

DR WPI; 2001-318749/34. 
XX 

PT Primer sets for synthesizing polynucleotides, particularly the 5602 

PT full-length cDNAs defined in the specification, and for the detection 

PT and/or diagnosis of the abnormality of the proteins encoded by the 

PT full-length cDNAs - 
XX 

PS Claim 1; SEQ- ID 4286; 2537pp + CD ROM; English. 
XX 

CC The present invention describes primer sets for synthesising 5602 



CC full-length cDNAs defined in the specification. Where a primer set 

CC comprises: (a) an oligo-dT primer and an oligonucleotide complementary 

CC to the complementary strand of a polynucleotide which comprises one of 

CC the 5602 nucleotide sequences defined in the specification, where the 

CC oligonucleotide comprises at least 15 nucleotides; or (b) a combination 

CC of an oligonucleotide comprising a sequence complementary to the 

CC complementary strand of a polynucleotide which comprises a 5 1 -end 

CC sequence and an oligonucleotide comprising a sequence complementary to a 

CC polynucleotide which comprises a 3 ' -end sequence, where the 

CC oligonucleotide comprises at least 15 nucleotides and the combination of 

CC the 5 '-end sequence/3 1 -end sequence is selected from those defined in 

CC the specification. The primer sets can be used in antisense therapy and 

CC in gene therapy. The primers are useful for synthesising polynucleotides, 

CC particularly full-length cDNAs . The primers are also useful for the 

CC detection and/or diagnosis of the abnormality of the proteins encoded by 

CC the full-length cDNAs . The primers allow obtaining of the full-length 

CC cDNAs easily without any specialised methods. AAH03166 to AAH13628 and 

CC AAH13633 to AAH18742 represent human cDNA sequences; AAB92446 to 

CC AAB95893 represent human amino acid sequences; and AAH13629 to AAH13632 

CC represent oligonucleotides, all of which are used in the exemplification 

CC of the present invention. 

XX 

SQ Sequence 786 BP; 252 A; 156 C; 161 G; 211 T; 6 other; 



Query Match 8.1%; Score 34.2; DB 22; Length 786; 

Best Local Similarity 58.3%; Pred. No. 0.26; 

Matches 60; Conservative 0; Mismatches 43; Indels 0; Gaps 0; 

Qy 195 agatgcaaggccatttgtggacgaagctgtgctgcaagtatctgactggggtttcagcct 254 

I I I I I I II I I I I I I I I I I I I M I I I I I I I I I I I I I 

Db 2 95 AAATGCATCTTTTGCTGCAAACGCAGCTGGTGTGTAAGTATCACACTGAGCCATTAGCCA 236 

Qy 255 atctgacatccaactgcagaagaaagaggctcaaggctttttt 2 97 

I I I I III II I I I I I I I II I I I I I I I 
Db 235 ATCTCCCATTAAACTTTTTAAGTAGGGAGCCAACTGTTTCTTT 193 



RESULT 8 
AAH16513/C 

ID AAH16513 standard; cDNA; 1753 BP. 
XX 

AC AAH16513; 
XX 

DT 26-JUN-2001 (first entry) 
XX 

DE Human cDNA sequence SEQ ID NO: 15552. 
XX 

KW Human; primer; detection; diagnosis; antisense therapy; gene therapy; ss. 
XX 

OS Homo sapiens . 
XX 

PN EP1074617-A2. 
XX 

PD 07-FEB-2001. 
XX 

PF 28-JUL-2000; 2000EP-011 612 6 . 



PR 29-JUL-1999; 99JP-0248036 . 

PR 27-AUG-1999; 99 JP-0300253 . 

PR ll-JAN-2000; 2000 JP-011877 6 . 

PR 02-MAY-2000; 2000 JP-01837 67 . 

PR 09-JUN-2000; 2000 JP-02418 99 . 
XX 

PA (HELI-) HELIX RES INST. 
XX 

PI Ota T, Isogai T, Nishikawa T, Hayashi K, Saito K, Yamamoto J; 

PI Ishii S, Sugiyama T, Wakamatsu A, Nagai K, Otsuki T; 

XX 

DR WPI; 2001-318749/34. 
XX 

PT Primer sets for synthesizing polynucleotides, particularly the 5602 

PT full-length cDNAs defined in the specification, and for the detection 

PT and/or diagnosis of the abnormality of the proteins encoded by the 

PT full-length cDNAs - 
XX 

PS Claim 8; SEQ ID 15552; 2537pp + CD ROM; English. 
XX 

CC The present invention describes primer sets for synthesising 5602 

CC full-length cDNAs defined in the specification. Where a primer set 

CC comprises: (a) an oligo-dT primer and an oligonucleotide complementary 

CC to the complementary strand of a polynucleotide which comprises one of 

CC the 5602 nucleotide sequences defined in the specification, where the 

CC oligonucleotide comprises at least 15 nucleotides; or (b) a combination 

CC of an oligonucleotide comprising a sequence complementary to the 

CC complementary strand of a polynucleotide which comprises a 5 1 -end 

CC sequence and an oligonucleotide comprising a sequence complementary to a 

CC polynucleotide which comprises a 3 ' -end sequence, where the 

CC oligonucleotide comprises at least 15 nucleotides and the combination of 

CC the 5 '-end sequence/3 1 -end sequence is selected from those defined in 

CC the specification. The primer sets can be used in antisense therapy and 

CC in gene therapy. The primers are useful for synthesising polynucleotides, 

CC particularly full-length cDNAs . The primers are also useful for the 

CC detection and/or diagnosis of the abnormality of the proteins encoded by 

CC the full-length cDNAs. The primers allow obtaining of the full-length 

CC cDNAs easily without any specialised methods. AAH03166 to AAH13628 and 

CC AAH13633 to AAH18742 represent human cDNA sequences; AAB92446 to 

CC AAB95893 represent human amino acid sequences; and AAH13629 to AAH13632 

CC represent oligonucleotides, all of which are used in the exemplification 

CC of the present invention. 

XX 

SQ Sequence 1753 BP; 535 A; 382 C; 359 G; 477 T; 0 other; 



Query Match 8.1%; Score 34.2; DB 22; Length 1753; 

Best Local Similarity 58.3%; Pred. No. 0.38; 

Matches 60; Conservative 0; Mismatches 43; Indels 0; Gaps 0 

Qy 195 agatgcaaggccatttgtggacgaagctgtgctgcaagtatctgactggggtttcagcct 254 

I I I I I I II I I I I II I I I I I I I II I I I I I I I I II II 

Db 2 95 AAATGCATCTTTTGCTGCAAACGCAGCTGGTGTGTAAGTATCACACTGAGCCATTAGCCA 236 . 

Qy 255 atctgacatccaactgcagaagaaagaggctcaaggctttttt 2 97 
I I I I III I I I I I I I I I II I I I I I I I 



Db 


235 ATCTCCCATTAAACTTTTTAAGTAGGGAGCCAACTGTTTCTTT 193 


RESULT 9 


AAF58252/C 


t n 


Mr joZ standard, UNA, yJO or, 


XX 




AC 


AAF58252 ; 


XX 




DT 


24-APR-2UU1 (rirst entry) 


XX 




DE 


Oligonucleotide DloJo. 


XX 




KW 


Electron-transfer group; ETM; mismatch; genotypmg; 


KW 


gene expression; ss. 


XX 




OS 


Synthetic . 


XX 




PN 




XX 




PD 


U1-FEB-2UU 1 . 


XX 




PF 


a /** tttt o a a a . a A A A iyt/"\ noon / l C 

26-JUL-2000; 200UWO-US2U4 / b . 


XX 




PR 


a /" tttt t a a a « a Ano n 1 / t c Q c 

2 o-jul-i y y y ; yyus-ui4ooyD . 


PR 


IT ««t\ n A A A A . A AAAnO A 1 A A A C A 

17-MAR-2000; 2000US-01 y0259 . 


XX 




PA 


(CLIN-) CLINICAL MICRO SENSORS INC. 


XX 




PI 


Umek RM; 


XX 




DR 


WPI; 2001-1597 28/16. 


XX 




PT 


Nucleic acids containing electron-transfer group, useful as labels m 


PT 


hybridization assays, e.g. for genotyping, allowing repeat analyses on 


PT 


a single surface - 


XX 




PS 


Example 6; Page 127; 159pp; English. 


XX 




CC 


The present invention relates to a composition comprising two nucleic 


CC 


acids each containing an electron-transfer group (ETM) having 


CC 


different redox potentials. The invention is used for electronic 


CC 


detection of nucleic acids, especially of substitutions (mismatches) 


CC 


and single-nucleotide polymorphisms, e.g. for genotyping, 


CC 


monitoring gene expression. 


XX 




SQ 


Sequence 936 BP; 4 A; 139 C; 10 G; 7 T; 776 other; 



Query Match 8.0%; Score 33.6; DB 22; Length 936; 

Best Local Similarity 1.7%; Pred. No. 0.44; 

Matches 6; Conservative 194; Mismatches 148; Indels 0; Gaps 



Qy 20 aattgaggttagcttaacaattcttagtagtcaccccttcgattaaatgtcaacatttgc 79 

Db 694 WWWWGWWWWWWWWWWWWWWWWWWWWWWWWWWWW 635 



Qy 


80 


cttttcgcgttccaattactaatgttacggcattattcaggacagaactttactggaacg 


139 


Db 


634 


WWWWWWWWWWWWWWWWWWWWWWWWWWWWWW 


575 


Qy 


140 


tcctgtgttcaatgcattctgggaaaggaatgttgcagagtctgtgcagccaaggagatg 


199 


Db 


574 


WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW 


515 


Qy 


200 


caaggccatttgtggacgaagctgtgctgcaagtatctgactggggtttcagcctatctg 


259 


Db 


514 


WWWWWWWWWWWWWWWWWWWWGWWWWWWWW 


455 


Qy 


d r n 


acatccaactgcagaagaaagaggctcaaggcttttttgaactcatcacgtctctgttca 


5 1 y 


Db . 


. 454 


WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW 


395 


Qy 


320 


atcatgctgaaaaacagtgggtgggatttctgggcccaatacatatat 367 




Db 


394 


AGCWWWWWWWWWWWWWWWWWWWWWWWWWW 347 





RESULT 10 
AAF58254/C 

ID AAF58254 standard; DNA; 936 BP. 
XX 

AC AAF58254; 
XX 

DT 24-APR-2001 (first entry) 
XX 

DE Oligonucleotide D1875. 
XX 

KW Electron-transfer group; ETM; mismatch; genotyping; 

KW gene expression; ss. 

XX 

OS Synthetic. 
XX 

PN WO200107665-A2. 
XX 

PD 01-FEB-2001. 

XX 

PF 26-JUL-2000; 2000WO-US204 7 6 . 
XX 

PR 26-JUL-1999; 99US-014 5695 . 
PR 17-MAR-2000; 2000US-0190259 . 
XX 

PA (CLIN-) CLINICAL MICRO SENSORS INC. 
XX 

PI Umek RM; 
XX 

DR WPI; 2001-159728/16. 
XX 

PT Nucleic acids containing electron-transfer group, useful as labels in 
PT hybridization assays, e.g. for genotyping, allowing repeat analyses on 
PT a single surface 
XX 

PS Example 6; Page 127; 159pp; English. 
XX 



CC The present invention relates to a composition comprising two nucleic 

CC acids each containing an electron-transfer group (ETM) having 

CC different redox potentials. The invention is used for electronic 

CC detection of nucleic acids, especially of substitutions (mismatches) 

CC and single-nucleotide polymorphisms, e.g. for genotyping, 

CC monitoring gene expression. 

XX 

SQ Sequence 936 BP; 4 A; 144 C; 7 G; 5 T; 776 other; 

Query Match 8.0%; Score 33.6; DB 22; Length 936; 

Best Local Similarity 1.7%; Pred. No. 0.44; 

Matches 6; Conservative 194; Mismatches 148; Indels 0; Gaps 0; 
Qy 20 aattgaggttagcttaacaattcttagtagtcaccccttcgattaaatgtcaacatttgc 7 9 

Db 694 WWWWGWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW 635 
Qy 80 cttttcgcgttccaattactaatgttacggcattattcaggacagaactttactggaacg 139 

Db 634 WWWWWWWWWWWWWWWWWWWWWWWWW 57 5 

Qy 140 tcctgtgttcaatgcattctgggaaaggaatgttgcagagtctgtgcagccaaggagatg 199 
Db 57 4 WWWWWWWWWWWWWWWWWWWWWWWWWWWWW 515 
Qy 200 caaggccatttgtggacgaagctgtgctgcaagtatctgactggggtttcagcctatctg 259 
Db 514 WWWWWWWWWWWWWWWWWWWWGWWWWWW 4 55 

Qy 2 60 acatccaactgcagaagaaagaggctcaaggcttttttgaactcatcacgtctctgttca 319 
Db 4 54 WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW 395 
Qy 320 atcatgctgaaaaacagtgggtgggatttctgggcccaatacatatat 367 
Db 394 AGCWWWWWWWWWWWWWWWWWWWWWWWWWWWW 347 



RESULT 11 
AAF58257/c 

ID AAF58257 standard; DNA; 936 BP. 
XX 

AC AAF58257; 
XX 

DT 24-APR-2001 (first entry) 
XX 

DE Oligonucleotide D1954. 
XX 

KW Electron-transfer group; ETM; mismatch; genotyping; 

KW gene expression; ss. 

XX 

OS Synthetic. 
XX . 

PN WO200107665-A2. 
XX 

PD 01-FEB-2001. 



XX 

PF 26-JUL-2000; 2000WO-US2047 6 . 
XX 

PR 26-JUL-1999; 99US-0145695 . 

PR 17-MAR-2000; 2000US-01 90259 . 
XX 

PA (CLIN- ) CLINICAL MICRO SENSORS INC. 
XX 

PI Umek RM; 
XX 

DR WPI; 2001-159728/16. 
XX 

PT Nucleic acids containing electron-transfer group, useful as labels in 

PT hybridization assays, e.g. for genotyping, allowing repeat analyses on 

PT a single surface 
XX 

PS Example 6; Page 127; 159pp; English. 
XX 

CC The present invention relates to a composition comprising two nucleic 

CC acids each containing an electron-transfer group (ETM) having 

CC different redox potentials. The invention is used for electronic 

CC detection of nucleic acids, especially of substitutions (mismatches) 

CC and single-nucleotide polymorphisms, e.g. for genotyping, 

CC monitoring gene expression. 

XX 

SQ Sequence 936 BP; 5 A; 142 C; 7 G; 6 T; 776 other; 



Query Match 8.0%; Score 33.6; DB 22; Length 936; 

Best Local Similarity 1.7%; Pred. No. 0.44; 



Matches 


6; Conservative 194; Mismatches 148; Indels 0; Gaps 


Qy 


20 


aattgaggttagcttaacaattcttagtagtcaccccttcgattaaatgtcaacatttgc 


79 


Db 


694 


WWWWGWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW 


635 


Qy 


80 


cttttcgcgttccaattactaatgttacggcattattcaggacagaactttactggaacg 


139 


Db 


634 


WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWCWWWWWWWWWWWWWWWWW 


575 


Qy 


140 


tcctgtgttcaatgcattctgggaaaggaatgttgcagagtctgtgcagccaaggagatg 


199 


Db 


574 


WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW 


515 


Qy 


200 


caaggccatttgtggacgaagctgtgctgcaagtatctgactggggtttcagcctatctg 


259 


Db 


514 


WWWWWWWWWWWWWWWWWWWWGWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW 


455 


Qy 


260 


acatccaactgcagaagaaagaggctcaaggcttttttgaactcatcacgtctctgttca 


319 


Db 


454 


WWWWWWWWWWWWWWWWWWWWWWWWW 


395 


Qy 


320 


atcatgctgaaaaacagtgggtgggatttctgggcccaatacatatat 367 




Db 


394 


AGCWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW 34 7 





RESULT 12 
AAF58259/C 

ID AAF58259 standard; DNA; 936 BP. 
XX 

AC AAF58259; 
XX 

DT 24-APR-2001 (first entry) 
XX 

DE Oligonucleotide D2004. 
XX 

KW Electron-transfer group; ETM; mismatch; genotyping; 

KW gene expression; ss. 

XX 

OS Synthetic. 
XX 

PN WO200107665-A2 . 
XX 

PD 01-FEB-2001. 
XX 

PF 26-JUL-2000; 2000WO-US2047 6 . 
XX 

PR 26-JUL-1999; 99US-0145695 . 

PR 17-MAR-2000; 2000US-01 90259 . 
XX 

PA (CLIN-) CLINICAL MICRO SENSORS INC. 
XX 

PI Umek RM; 
XX 

DR WPI; 2001-159728/16. 
XX 

PT Nucleic acids containing electron-transfer group, useful as labels in 

PT hybridization assays, e.g. for genotyping, allowing repeat analyses on 

PT a single surface 
XX 

PS Example 6; Page 128; 159pp; English. 
XX 

CC The present invention relates to a composition comprising two nucleic 

CC acids each containing an electron-transfer group (ETM) having 

CC different redox potentials. The invention is used for electronic 

CC detection of nucleic acids, especially of substitutions (mismatches) 

CC and single-nucleotide polymorphisms, e.g. for genotyping, 

CC monitoring gene expression. 

XX 

SQ Sequence 936 BP; 6 A; 138 C; 8 G; 8 T; 776 other; 



Query Match 8.0%; Score 33.6; DB 22; Length 936; 

Best Local Similarity 1.7%; Pred. No. 0.44; 

Matches 6; Conservative 194; Mismatches 148; Indels 0; Gaps 0; 

Qy 20 aattgaggttagcttaacaattcttagtagtcaccccttcgattaaatgtcaacatttgc 7 9 

: : : : | : : : : ::::::::::::::: : : :::::::::::::: 
Db 694 WWWWGWWWWWWWWWWWWWWWWWWWWWWWWWWWW 635 

Qy 80 cttttcgcgttccaattactaatgttacggcattattcaggacagaactttactggaacg 139 

Db 6 3 '4 WWWWWWWWWWWWWWWWWWWWWWWWWWW 575 



Qy 


140 


tcctgtgttcaatgcattctgggaaaggaatgttgcagagtctgtgcagccaaggagatg 


199 


Db 


574 


WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW 


515 


Qy 


200 


caaggccatttgtggacgaagctgtgctgcaagtatctgactggggtttcagcctatctg 


259 


Db 


514 


WWWWWWWWWWWWWWWWWWWWGWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW 


455 


Qy 


260 


acatccaactgcagaagaaagaggctcaaggcttttttgaactcatcacgtctctgttca 


319 


Db 


454 


WWWWWWWWWWWWWWWWWWWWWWWWWWWWWW 


395 


Qy 


320 


atcatgctgaaaaacagtgggtgggatttctgggcccaatacatatat 367 




Db 


394 


AGCWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW 347 





RESULT 13 
AAF58262/C 

ID AAF58262 standard; DNA; 936 BP. 
XX 

AC AAF582 62; 
XX 

DT 24-APR-2001 (first entry) 
XX 

DE Oligonucleotide D2007. 
XX 

KW Electron-transfer group; ETM; mismatch; genotyping; 

KW gene expression; ss. 

XX 

OS Synthetic. 
XX 

PN WO200107665-A2. 
XX 

PD 01-FEB-2001. 
XX 

PF 26-JUL-2000; 2000WO-US20476 . 
XX 

PR 26-JUL-1999; 99US-0145695 . 

PR 17-MAR-2000; 2000US-01 902 59 . 
XX 

PA (CLIN- ) CLINICAL MICRO SENSORS INC. 
XX 

PI Umek RM; 
XX 

DR WPI; 2001-159728/16. 
XX 

PT Nucleic acids containing electron-transfer group, useful as labels in 

PT hybridization assays, e.g. for genotyping, allowing repeat analyses on 

PT a single surface 
XX 

PS Example 6; Page 128; 159pp; English. 
XX 

CC The present invention relates to a composition comprising two nucleic 

CC acids each containing an electron-transfer group (ETM) having 

CC different redox potentials. The invention is used for electronic 



CC detection of nucleic acids, especially of substitutions (mismatches) 

CC and single-nucleotide polymorphisms, e.g. for genotyping, 

CC monitoring gene expression. 
XX 

SQ Sequence 936 BP; 5 A; 139 C; 10 G; 6 T; 776 other; 

Query Match 8.0%; Score 33.6; DB 22; Length 936; 

Best Local Similarity 1.7%; Pred. No. 0.44; 

Matches 6; Conservative 194; Mismatches 148; Indels 0; Gaps 0; 



Qy 


20 


aattgaggttagcttaacaattcttagtagtcaccccttcgattaaatgtcaacatttgc 
: : : : | : : : : ::::::::::::::: : : :::::::::::::: 


79 


Db 


694 


WWWWGWWWWWWWWWWWWWWWWWWWWWWWWWWW 


635 


Qy 


80 


cttttcgcgttccaattactaatgttacggcattattcaggacagaactttactggaacg 


139 


Db 


634 


wwwwwwwwwwwwwwwwwwwwwwwwww 


575 


Qy 


140 


tcctgtgttcaatgcattctgggaaaggaatgttgcagagtctgtgcagccaaggagatg 


199 


Db 


574 


WWWWWWWWWWWWWWWWWWWWWWWWWWWWWW 


515 


Qy 


200 


caaggccatttgtggacgaagctgtgctgcaagtatctgactggggtttcagcctatctg 


259 


Db 


514 


WWWWWWWWWWWWWWWWWWWWGWWWWWW 


455 


Qy 


260 


acatccaactgcagaagaaagaggctcaaggcttttttgaactcatcacgtctctgttca 


319 


Db 


454 


WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW 


395 


Qy 


320 


atcatgctgaaaaacagtgggtgggatttctgggcccaatacatatat 3 67 




Db 


394 


AGCWWWWWWWWWWWWWWWWWWWWWWWWWWW 34 7 





RESULT 14 
AAF58255/c 

ID AAF58255 standard; DNA; 938 BP. 
XX 

AC AAF58255; 
XX 

DT 24-APR-2001 (first entry) 
XX 

DE Oligonucleotide D1876. 
XX 

KW Electron-transfer group; ETM; mismatch; genotyping; 

KW gene expression; ss. 

XX 

OS Synthetic. 
XX 

PN WO200107665-A2. 
XX 

PD 01-FEB-2001. 
XX 

PF 26-JUL-2000; 2000WO-US2047 6 . 
XX 



PR 26-JUL-1999; 99US-0145695 . 

PR 17-MAR-2000; 2000US-01 90259 . 
XX 

PA (CLIN-) CLINICAL MICRO SENSORS INC. 
XX 

PI Umek RM; 
XX 

DR WPI; 2001-159728/16. 
XX 

PT Nucleic acids containing electron-transfer group, useful as labels in 

PT hybridization assays, e.g. for genotyping, allowing repeat analyses on 

PT a single surface 
XX 

PS Example 6; Page 127; 159pp; English. 
XX 

CC The present invention relates to a composition comprising two nucleic 

CC acids each containing an electron-transfer group (ETM) having 

CC different redox potentials. The invention is used for electronic 

CC detection of nucleic acids, especially of substitutions (mismatches) 

CC and single-nucleotide polymorphisms, e.g. for genotyping, 

CC monitoring gene expression. 

XX 

SQ Sequence 938 BP; 4 A; 144 C; 9 G; 5 T; 776 other; 



Query Match 8.0%; Score 33.6; DB 22; Length 938; 

Best Local Similarity 1.7%; Pred. No. 0.44; 



Matches 


6; Conservative 194; Mismatches 148; Indels 0; Gaps 


Qy 


20 


aattgaggttagcttaacaattcttagtagtcaccccttcgattaaatgtcaacatttgc 


79 


Db 


694 


WWWWGWWWWWWWWWWWWWWWWWWWWWWWWWW 


635 


Qy 


80 


cttttcgcgttccaattactaatgttacggcattattcaggacagaactttactggaacg 


139 


Db 


634 


WWWWWWWWWWWWWWWWWWWWWWWW 


575 


Qy 


140 


tcctgtgttcaatgcattctgggaaaggaatgttgcagagtctgtgcagccaaggagatg 


199 


Db 


574 


wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww 


515 


Qy 


200 


caaggccatttgtggacgaagctgtgctgcaagtatctgactggggtttcagcctatctg 


259 


Db 


514 


WWWWWWWWWWWWWWWWWWWWGWWWWWWWW 


455 


Qy 


260 


acatccaactgcagaagaaagaggctcaaggcttttttgaactcatcacgtctctgttca 


319 


Db 


454 


WWWWWWWWWWWWWWWWWWWWWWWWWWW 


395 


Qy 


320 


atcatgctgaaaaacagtgggtgggatttctgggcccaatacatatat 367 




Db 


394 


AGCWWWWWWWWWWWWWWWWWWWWWWWWWWWW 347 





RESULT 15 
AAX98756 

ID AAX98756 standard; cDNA; 752 BP. 



XX 

AC AAX98756; 
XX 

DT 24-SEP-1999 (first entry) 
XX 

DE Human validated cancer cell derived cDNA #78. 
XX 

KW Cancer; human; colon; breast; lung; transmembrane receptor; ATPase; 

KW integral membrane protein; aspartyl protease; GATA family; wnt family; 

KW transcription factor; G-protein alpha subunit; protein phosphatase; 

KW phorbolester binding protein; diacylglycerol binding protein; trypsin; 

KW protein kinase; tyrosine phosphatase; developmental signalling protein; 

KW WW/rsp5/WWP domain; therapy; forensic; genetic mapping; diagnostic; 

KW detection; treatment; cervical; melanoma; colorectal adenocarcinoma; 

KW Wilm's tumour; retinoblastoma; sarcoma; myosarcoma; lung carcinoma; 

KW leukemia; lymphoma; dysplasia; hyperplasia; endometrium; adrenal; 

KW prostate; ss . 
XX 

OS Homo sapiens . 
XX 

PN W09933982-A2 . 
XX 

PD 08-JUL-1999. 
XX 

PF 22-DEC-1998; 
XX 

PR 21-DEC-1998; 

PR 23-DEC-1997; 

PR 03-APR-1998; 

PR 21-OCT-1998; 

PR 27-OCT-1998; 
XX 

PA (CHIR ) CHIRON CORP. 

PA (HYSE-) HYSEQ INC. 
XX 

PI Crkvenjakov R, Dickson M, Drmanac R, Drmanac S; 

PI Escobedo J, Garcia PD, Garcia V, Giese K, Innis MA; 

PI Jones LW, Kassam A, Kennedy GC, Kita D, Labat I; 

PI Lamson G, Leshkowitz D, Pot D, Randazzo F, Reinhard C; 

PI Stache-Crain B, Sudduth-Klinger J, Williams LT; 

XX 

DR WPI; 1999-430243/36. 
XX 

PT New isolated human polynucleotides 
XX 

PS Claim 1; Page 444-445; 591pp; English. 
XX 

CC This invention describes novel isolated human polynucleotides obtained 

CC by screening for differential expression in colon cancer, breast cancer 

CC and lung cancer cell lines. The polynucleotides of the invention are 

CC represented in AAX9827 5-X99118 and encode polypeptides of protein 

CC families selected from 4 transmembrane segments integral membrane 

CC proteins, 7 transmembrane receptors, ATPases associated with various 

CC cellular activities (AAA) , eukaryotic aspartyl proteases, GATA family of 

CC transcription factors, G-protein alpha subunit, phorbolesters or 

CC diacylglycerol binding proteins, protein kinase, protein phosphatase 2C, 

CC protein tyrosine phosphatase, trypsin, wnt family of developmental 



98WO-US27610. 

98US-0217471. 
97US-0068755. 
98US-0080664 . 
98US-0105234 . 
98US-0105877. 



CC signalling proteins and WW/rsp5/WWP domain containing proteins. The 

CC encoded polypeptides also have a functional domain selected from Ank 

CC repeat, basic region plus leucine zipper transcription factors, 

CC bromodomain, EF-hand, SH3 domain, WD domain/G-beta repeats, zinc finger 

CC (C2H2 type), zinc finger (CCHC class), and zinc-binding metalloprotease 

CC domain. The polynucleotides encode polypeptides with similarity to known 

CC protein families and are predicted to have similar properties. The novel 

CC polynucleotides can be used to develop products for use as therapeutic 

CC agents and in forensics, genetic analysis, mapping and diagnostic 

CC applications. In particular, the product can be used for the detection 

CC and management of cancers. They can be used for treating e.g. cervical 

CC cancers, melanomas, colorectal adenocarcinomas, Wilm's tumour, sarcomas, 

CC retinoblastoma, myosarcomas, lung carcinomas, leukemias, such as chronic 

CC myelogenous leukemia, promyelocytic leukemia, monocytic leukemia, and 

CC myeloid leukemia, and lymphomas such as histiocytic lymphoma, anhydric 

CC hereditary ectodermal dysplasia, congenital alveolar dysplasia, 

CC epithelial dysplasia of the cervix, fibrous dysplasia of bone, and 

CC mammary dysplasia, hyperplasias, e.g. endometrial, adrenal, breast, 

CC prostate or thyroid hyperplasias or pseudoepitheliomatous hyperplasia of 

CC the skin. 

XX 

SQ Sequence 752 BP; 204 A; 191 C; 166 G; 173 T; 18 other; 



Query Match 7.8%; Score 33; DB 20; Length 752; 

Best Local Similarity 67.2%; Pred. No. 0.63; 

Matches 45; Conservative 0;' Mismatches 22; Indels 0; Gaps 0; 

Qy 141 cctgtgttcaatgcattctgggaaaggaatgttgcagagtctgtgcagccaaggagatgc 200 

I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 
Db 540 cctgtaatcccagcactttgggaagcaaangtggcaggatcattccagcccaggagtttc 599 

Qy 201 aaggcca 207 

III II 
Db 600 aaganca 606 



Search completed: February 7, 2002, 10:59:37 
Job time: 4963 sec 

GenCore version 4.5 
Copyright (c) 1993 - 2000 Compugen Ltd. 



OM nucleic - nucleic search, using sw model 

Run on: February 7, 2002, 10:51:34 ; Search time 172.96 Seconds 

{without alignments) 
551.268 Million cell updates/sec 

Title: US-0 9-394 -7 4 5-5950 

Perfect score: 421 

Sequence: 1 gggtccaggcacgcgtccga agtggcagaatttgtgccgc 421 



Scoring table: 



IDENTITY_NUC 

Gapop 10.0 , Gapext 1.0 



Searched: 



351203 seqs, 113238999 residues 



Total number of hits satisfying chosen parameters: 702406 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 

Database : Issued_Patents_NA: * 

1 : /cgn2_6/ptodata/2/ina/5A_COMB.seq: * 

2 : /cgn2_6/ptodata/2/ina/5B_COMB.seq: * 

3 : /cgn2_6/ptodata/2/ina/6A_COMB . seq: * 

4 : /cgn2_6/ptodata/2/ina/6B__COMB.seq: * 

5 : /cgn2_6/ptodata/2/ina/PCTUS_COMB .seq: * 

6 : /cgn2_6/ptodata/2/ina/backf ilesl . seq: * 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SUMMARIES 

% 



Result 
No. 


Score 


Query 
Match 


Length 


DB 


ID 


Description 


c 


1 


31 


7.4 


2878 


1 


US-07-903-456-1 


Sequence 


1, Appli 


c 


2 


31 


7.4 


2878 


3 


US-08-666-221B-5 


Sequence 


5, Appli 


c 


3 


31 


7.4 


2878 


3 


US-08-666-221B-11 


Sequence 


11, Appl 


c 


4 


31 


7.4 


2878 


3 


US-08-666-221B-13 


Sequence 


13, Appl 




5 


30. 6 


7.3 


2646 


1 


US-08-539-304A-5 


Sequence 


5, Appli 


c 


6 


30. 6 


7.3 


6253 


2 


US-08-627-151A-5 


Sequence 


5, Appli 


c 


7 


30 


7 . 1 


2381 


2 


US-08-318-826A-9 


Sequence 


9, Appli 


c 


8 


30 


7.1 


2400 


6 


5215909-13 


Patent No. 


5215909 


c 


9 


30 


7.1 


2416 


2 


US-08-318-826A-8 


Sequence 


8, Appli 


c 


10 


30 


7.1 


2416 


4 


US-09-334-489-1 


Sequence 


1, Appli 


c 


11 


30 


7.1 


2416 


4 


US-09-334-489-2 


Sequence 


2,. Appli 




12 


29 


6.9 


2204 


1 


US-08-221-817-12 


Sequence 


12, Appl 




13 


29 


6.9 


2204 


1 


US-08-454-439-12 


Sequence 


12, Appl 




14 


29 


6.9 


2204 


5 


PCT-US94-10487-12 


Sequence 


12, Appl 




15 


29 


6.9 


2206 


1 


US-08-221-817-10 


Sequence 


10, Appl 




16 


29 


6.9 


2206 


1 


US-08-454-439-10 


Sequence 


10, Appl 




17 


29 


6.9 


2206 


5 


PCT-US94-10487-10 


Sequence 


10, Appl 




18 


29 


6.9 


2848 


4 


US-08-464-954A-2 


Sequence 


2, Appli 




19 


28.8 


6.8 


2447 


2 


US-09-014-969-14 


Sequence 


14, Appl 


c 


20 


28.8 


6.8 


9636 


1 


US-08-323-170B-1 


Sequence 


1, Appli 


c 


21 


28.6 


6.8 


221 


1 


US-07-792-525B-1 


Sequence 


1, Appli 


c 


22 


28.6 


6.8 


1915 


3 


US-09-120-365-2 


Sequence 


2, Appli 


c 


23 


28.6 


6.8 


1915 


4 


US-09-515-039-2 


Sequence 


2, Appli 


c 


24 


28.6 


6.8 


1916 


3 


US-09-120-365-88 


Sequence 


88, Appl 


c 


25 


28.6 


6.8 


1916 


4 


US-09-515-039-88 


Sequence 


88, Appl 


c 


26 


27.8 


6.6 


213 


5 


PCT-US95-13333-1 


Sequence 


1, Appli 


c 


27 


27.8 


6.6 


1776 


1 


US-08-464-523B-4 


Sequence 


4, Appli 


c 


28 


27.6 


6.6 


1207 


2 


US-08-694-869-3 


Sequence 


3, Appli 



c 


zy 


Z / 


c 
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c 
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j 
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4, 


Appli 


c 


JZ 


Z / 


c 
. D 


b . 


b 


1 Q Z Z 


9 
Z 


Ub Uo Dj'! OD? J 


O /"Y 1 -1 /"\ T"\ /~\ 


5, 


Appli 


c 


3 J 


O T 
Z / 


. D 


c 

D . 


b 


1 H ZZ 


9 


ub uy j4 y Dfi d d 


O \ 1 v\ ^ 


5, 


Appli 


c 


*3 A 

o4 


9 "7 
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c 
. D 


c 
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b 
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2, 


Appli 


c 
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. b 
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b 


/ JOO 


o 
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2, 


Appli 


c 




9 T 
Z 1 


. b 


b . 


b 


1 KK 




Uo uy i / j y ± *i — o 




6, 


Appli 


c 


J 1 


9 9 
Z / 




b . 


c 

o 


/ / 3 




nC — HR — QQQ_ A 1 £_££Q 


Qorfnon oca 


669, App 




O O 
OD 


9 9 
Z / 


. 4 


b . 


c 

o 
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1 
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3, 


Appli 






27 


.4 


b . 


c 
0 
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RESULT 1 
US-07-903-456-1/C 

; Sequence 1, Application US/07903456 
; Patent No. 5574144 

GENERAL INFORMATION: 

APPLICANT: KAMBOJ, Rajender 

APPLICANT: ELLIOTT, Candace 

APPLICANT: NUTT , Stephen 

TITLE OF INVENTION: KAI NATE-BINDING HUMAN CNS RECEPTORS OF 
TITLE OF INVENTION: THE EAA4 FAMILY 
NUMBER OF SEQUENCES: 9 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: Foley & Lardner 

STREET: 1800 Diagonal Road, Suite 500 

CITY: Alexandria 

STATE : VA 

COUNTRY: USA 

ZIP: 22313-0299 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 

COMPUTER: IBM PC compatible 

OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Patentln Release #1.0, Version #1.25 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/07/903,456 

FILING DATE: 19920624 

CLASSIFICATION: 435 
ATTORNEY/AGENT INFORMATION: 

NAME: BENT, Stephen A. 

REGISTRATION NUMBER: 29,768 

REFERENCE/DOCKET NUMBER: 1 677 7 /183/ALLE 
TELECOMMUNICATION INFORMATION: 

TELEPHONE: (703)836-9300 

TELEFAX: (7 03)683-4109 

TELEX: 899149 



INFORMATION FOR SEQ ID NO: 1: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 2878 base pairs 
TYPE: NUCLEIC ACID 
STRANDEDNESS : double 
TOPOLOGY: linear 
MOLECULE TYPE: cDNA 
FEATURE: 

NAME/KEY : sig_peptide 
LOCATION: 134 . .226 
FEATURE: 

NAME/KEY : mat_pept ide 
LOCATION: 227 . . 2860 
FEATURE: 

NAME /KEY : CDS 
LOCATION: 134 . . 2860 
US-07-903-456-1 



Query Match 7.4%; Score 31; DB 1; Length 2878; 

Best Local Similarity 46.2%; Pred. No. 1.1; 

Matches 103; Conservative 0; Mismatches 120; Indels 0; Gaps 0; 

Qy 60 gattaaatgtcaacatttgccttttcgcgttccaattactaatgttacggcattattcag 119 

I I I I I I I I I I I I I I! I I I I I I I I I I I I I I MM I I 
Db 814 GATTACATGAAACTCCTTGCCTCTTTTCATTTCTTTTAGTAAGGGTTTTGCATCCTTTGT 7 55 

Qy 120 gacagaactttactggaacgtcctgtgttcaatgcattctgggaaaggaatgttgcagag 179 

M I I I M M I M I I I I I I I M M I M I 

Db 7 54 ATCAGCAGGTAACTGACGAATTTTGAGTCGAAGATTATACCTTGATGGAGCTTTGATGAG 695 

Qy 180 tctgtgcagccaaggagatgcaaggccatttgtggacgaagctgtgctgcaagtatctga 239 

M M I I M M I M I I M I I I 

Db 694 CTCTTGCAAACGAATGAGACCAGTGCTGTCATCATACACAACCGTGACGGTTTTCCACTT 635 

Qy 240 ctggggtttcagcctatctgacatccaactgcagaagaaagag 282 

IMI M I I M M M M M M I 
Db 634 GAAAAACTGCACCAGGTCTAAAATGGCACGGCTGAGTGAAGAG 592 



RESULT 2 
US-08-666-221B-5/C 

; Sequence 5, Application US/08666221B 

; Patent No. 6136544 

; GENERAL INFORMATION: 

APPLICANT: Kambo j , Rajender 

APPLICANT: Nutt, Stephen 

TITLE OF INVENTION: GLUTAMATE RECEPTOR (OR EAA RECEPTOR) 
TITLE OF INVENTION: POLYNUCLEOTIDES AND THEIR USES 
NUMBER OF SEQUENCES: 32 
CORRESPONDENCE ADDRESS: 
; ADDRESSEE: Foley & Lardner 

STREET: 3000 K Street, N.W., Suite 500 

CITY: Washington 

STATE: D.C. 

COUNTRY: USA 

ZIP : 20007-5109 



COMPUTER READABLE FORM : 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 
SOFTWARE: Patentln Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/666, 221B 
FILING DATE: 20-JUN-1996 
CLASSIFICATION: 435 
PRIOR APPLICATION DATA: 
APPLICATION NUMBER: 
FILING DATE: 
ATTORNEY/AGENT INFORMATION: 
NAME: Bent, Stephen A. 
REGISTRATION NUMBER: 29,768 
REFERENCE/DOCKET NUMBER: 016777/0308 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: (202) 672-5300 
TELEFAX: (202) 672-5399 
TELEX: 904136 
INFORMATION FOR SEQ ID NO: 5: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 2878 base pairs 
TYPE: nucleic acid 
STRANDEDNESS : double 
TOPOLOGY: linear 
MOLECULE TYPE: cDNA 
FEATURE: 

NAME/KEY: sig_peptide 
LOCATION: 134 . . 22 6 
FEATURE : 

NAME/KEY : mat_peptide 
LOCATION: 227. .2860 
FEATURE : 

NAME /KEY : CDS 
LOCATION: 134 . .2860 
US-08-666-221B-5 



Query Match 7.4%; Score 31; DB 3; Length 2878; 

Best Local Similarity 46.2%; Pred. No. 1.1; 

Matches 103; Conservative 0; Mismatches 120; Indels 0; Gaps 0; 

Qy 60 gattaaatgtcaacatttgccttttcgcgttccaattactaatgttacggcattattcag 119 

I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I II II 

Db 814 GATTACATGAAACTCCTTGCCTCTTTTCATTTCTTTTAGTAAGGGTTTTGCATCCTTTGT 7 55 

Qy 120 gacagaactttactggaacgtcctgtgttcaatgcattctgggaaaggaatgttgcagag 179 

III I I I I I I I I I I I II I I III III III 

Db 754 ATCAGCAGGTAACTGACGAATTTTGAGTCGAAGATTATACCTTGATGGAGCTTTGATGAG 695 

Qy 180 tctgtgcagccaaggagatgcaaggccatttgtggacgaagctgtgctgcaagtatctga 239 

I I I I I I I I I I I I I I I I I I I I 

Db 694 CTCTTGCAAACGAATGAGACCAGTGCTGTCATCATACACAACCGTGACGGTTTTCCACTT 635 



Qy 24 0 ctggggtttcagcctatctgacatccaactgcagaagaaagag 282 
I I I I M I I .1 I II I I I I I I I I I 



Db 634 G AAAAAC TGCACCAGGTCT AAAAT G G C AC GG C T G AGT G AAG AG 592 



RESULT 3 
US-08-666-221B-11/C 

Sequence 11, Application US/08666221B 
Patent No. 6136544 
GENERAL INFORMATION : 

APPLICANT: Kambo j , Rajender 
APPLICANT: Nutt, Stephen 

TITLE OF INVENTION: GLUTAMATE RECEPTOR (OR EAA RECEPTOR) 
TITLE OF INVENTION: POLYNUCLEOTIDES AND THEIR USES 
NUMBER OF SEQUENCES: 32 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: Foley & Lardner 
STREET: 3000 K Street, N.W., Suite 500 
CITY: Washington 
.STATE: D.C. 
COUNTRY: USA 
ZIP: 20007-5109 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 
SOFTWARE: Patentln Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/666, 221B 
FILING DATE: 20-JUN-1996 
CLASSIFICATION: 4 35 
PRIOR APPLICATION DATA: 
APPLICATION NUMBER: 
FILING DATE: 
ATTORNEY/AGENT INFORMATION: 
NAME: Bent, Stephen A. 
REGISTRATION NUMBER: 29,768 
REFERENCE/DOCKET NUMBER: 016777/0308 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: (202) 672-5300 
TELEFAX: (202) 672-5399 
TELEX: 904136 
INFORMATION FOR SEQ ID NO: 11: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 2878 base pairs 
TYPE: nucleic acid 
STRANDEDNESS : double 
TOPOLOGY: linear 
MOLECULE TYPE: cDNA 
FEATURE: 

NAME/KEY : sig_peptide 
LOCATION: 134. .22 6 
FEATURE : 

NAME/KEY: mat_peptide 
LOCATION: 227. .2860 
FEATURE: 

NAME/KEY: CDS 
LOCATION: 134 . .2860 
US-08-666-221B-11 



Query Match 7.4%; Score 31; DB 3; Length 2878; 

Best Local Similarity 46.2%; Pred. No. 1.1; 

Matches 103; Conservative 0; Mismatches 120; Indels 0; Gaps 0; 

Qy 60 gattaaatgtcaacatttgccttttcgcgttccaattactaatgttacggcattattcag 119 

I! I I I I I I I I I I I I I I I I II I II I I I I I I I I I I II 
Db 814 GATTACATGAAACTCCTTGCCTCTTTTCATTTCTTTTAGTAAGGGTTTTGCATCCTTTGT 7 55 

Qy 120 gacagaactttactggaacgtcctgtgttcaatgcattctgggaaaggaatgttgcagag 179 

Mil I I I I I I II I I II I I I I I Ml Ml 

Db 7 54- ATCAGCAGGTAACTGACGAATTTTGAGTCGAAGATTATACCTTGATGGAGCTTTGATGAG 695 

Qy 180 tctgtgcagccaaggagatgcaaggccatttgtggacgaagctgtgctgcaagtatctga 239 

I I I I I I I I I I I I I I I I I I I I 

Db 694 CTCTTGCAAACGAATGAGACCAGTGCTGTCATCATACACAACCGTGACGGTTTTCCACTT 635 

Qy 240 ctggggtttcagcctatctgacatccaactgcagaagaaagag 282 

I I I I II I I II M I I I I Mill 
Db 634 GAAAAACTGCACCAGGTCTAAAATGGCACGGCTGAGTGAAGAG 592 



RESULT 4 
US-08-666-221B-13/c 

; Sequence 13, Application US/08666221B 
; Patent No. 6136544 
; GENERAL INFORMATION: 
; APPLICANT: Kambo j , Rajender 
APPLICANT: Nutt, Stephen 

TITLE OF INVENTION: GLUTAMATE RECEPTOR (OR EAA RECEPTOR) 
TITLE OF INVENTION: POLYNUCLEOTIDES AND THEIR USES 
NUMBER OF SEQUENCES: 32 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: Foley & -Lardner 

STREET: 3000 K Street, N.W., Suite 500 
; CITY: Washington 

STATE: D.C. 

COUNTRY: USA 

ZIP : 20007-5109 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 

COMPUTER: IBM PC compatible 

OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Patentln Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08 /666, 221B 

FILING DATE: 20-JUN-1996 

CLASSIFICATION: 435 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: 

FILING DATE: 
ATTORNEY/AGENT INFORMATION: 

NAME: Bent, Stephen A. 

REGISTRATION NUMBER: 29,768 

REFERENCE/DOCKET NUMBER: 016777/0308 
TELECOMMUNICATION INFORMATION: 



• TELEPHONE: (202)672-5300 
TELEFAX: (202) 672-5399 
TELEX: 904136 
INFORMATION FOR SEQ ID NO: 13: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 2878 base pairs 
TYPE: nucleic acid 
STRANDEDNESS : double 
TOPOLOGY: linear 
MOLECULE TYPE: cDNA 
FEATURE: 

NAME/KEY: sig_peptide 
LOCATION: 134 . . 226 
FEATURE: 

NAME/KEY : mat__peptide 
LOCATION: 227 . . 2860 
FEATURE: 

NAME/KEY: CDS 
LOCATION: 134 . . 2860 
US-08-666-221B-13 



Query Match 7.4%; Score 31; DB 3; Length 2878; 

Best Local Similarity 46.2%; Pred. No. 1.1; 

Matches 103; Conservative 0; Mismatches 120; Indels 0; Gaps 0; 

Qy 60 gattaaatgtcaacatttgccttttcgcgttccaattactaatgttacggcattattcag 119 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 11 
Db 814 GATTACATGAAACTCCTTGCCTCTTTTCATTTCTTTTAGTAAGGGTTTTGCATCCTTTGT 755 

Qy 120 gacagaactttactggaacgtcctgtgttcaatgcattctgggaaaggaatgttgcagag 17 9 

Mil I I I II I I I I I II I I I I I III III 

Db 7 54 AT C AG C AG G T AAC T G AC G AAT T T T G AGT C G AAG AT T AT AC C T T GAT G GAG C T T T GAT GAG 695 

Qy 180 tctgtgcagccaaggagatgcaaggccatttgtggacgaagctgtgctgcaagtatctga 239 

I I I I I I I I I I I I I I I I I I I I 

Db 694 CTCTTGCAAACGAATGAGACCAGTGCTGTCATCATACACAACCGTGACGGTTTTCCACTT 635 

Qy 240 ctggggtttcagcctatctgacatccaactgcagaagaaagag 282 

I I I I II 1 I I I I I I I I I I I I I I 
Db 634 GAAAAACTGCACCAGGTCTAAAATGGCACGGCTGAGTGAAGAG 592 



RESULT 5 
US-08-539-304A-5 

; Sequence 5, Application US/08539304A 
; Patent No. 5792933 
; GENERAL INFORMATION: 

APPLICANT : MA, DIN-POW 

TITLE OF INVENTION: FIBER-SPECIFIC PROTEIN EXPRESSION IN THE 
TITLE OF INVENTION: COTTON PLANT 
NUMBER OF SEQUENCES: 7 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: OBLON, SPIVAK, MCCLELLAND, MAIER & NEUSTADT 

STREET: 1755 JEFFERSON DAVIS HWY. SUITE 400 

CITY: ARLINGTON 

STATE: VA 



COUNTRY: USA 
ZIP: 22202 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 
SOFTWARE: Patentln Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08 /539, 304A 
FILING DATE: 04-OCT-1995 
CLASSIFICATION: 800 
ATTORNEY/AGENT INFORMATION: 
NAME: NORMAN, OBLON F 
REGISTRATION NUMBER: 24,618 
REFERENCE/DOCKET NUMBER: 2343-037-27 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: 703-413-3000 
TELEFAX: 703-413-2220 
INFORMATION FOR SEQ ID NO: 5: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 2646 base pairs 
TYPE: nucleic acid 
STRANDEDNESS : double 
TOPOLOGY: linear 
MOLECULE TYPE: DNA (genomic) 
FEATURE: 

NAME /KEY : CDS 

LOCATION: join (741. .1093, 1220.. 1226) 
FEATURE: 

NAME/KEY: intron 
LOCATION: 1094.. 1219 
US-08-539-304A-5 



Query Match 7.3%; Score 30.6; DB 1; Length 2646; 

Best Local Similarity 60.0%; Pred. No. 1.5; 

Matches 51; Conservative 0; Mismatches 34; Indels 0; Gaps 0; 

Qy 147 ttcaatgcattctgggaaaggaatgttgcagagtctgtgcagccaaggagatgcaaggcc 20 6 

I I I I I I I I I I I I I I I I I I I I I I I I I I I III I I I I I I I I 

Db 858 TGCATTGCTTACTTGAAAGGGAATGGTGCTGGTTCTGCTCCCCCAGCTTGCTGCAACGGC 917 

Qy 207 atttgtggacgaagctgtgctgcaa 231 

II I I I I I I M I I I 

Db 918 ATCAGATCTCTCAACTCTGCCGCCA 942 



RESULT 6 
US-08-627-151A-5/C 

; Sequence 5, Application US/08627151A 
; Patent No. 5866341 
; GENERAL INFORMATION: 

APPLICANT: SPINELLA, Dominic 

APPLICANT: BECHERER, Kathleen 

APPLICANT: BROWN, Steven 

TITLE OF INVENTION: COMPOSITIONS AND METHODS FOR 
TITLE OF INVENTION: SCREENING DRUG LIBRARIES 



NUMBER OF SEQUENCES: 19 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: Gen-Probe Incorporated 

STREET: 10210 Genetic Center Drive 

CITY: San Diego 

STATE: CA 

COUNTRY: USA 

ZIP: 92121 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Diskette 

COMPUTER: IBM Compatible 

OPERATING SYSTEM: DOS 

SOFTWARE: FastSEQ for Windows Version 2.0 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08 /627 , 151A 

FILING DATE: 03-APR-1996 

CLASSIFICATION: 435 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: 

FILING DATE: 
ATTORNEY/AGENT INFORMATION: 

NAME: Fisher, Carlos A 

REGISTRATION NUMBER: 36,510 

REFERENCE/DOCKET NUMBER: CBI016 
TELECOMMUNICATION INFORMATION: 
; . TELEPHONE: 619-410-8926 

TELEFAX: 619-410-8928 

TELEX: 

; INFORMATION FOR SEQ ID NO: 5: 
SEQUENCE CHARACTERISTICS: 

LENGTH: 6253 base pairs 
; TYPE: nucleic acid 

STRANDEDNESS: single 

TOPOLOGY: linear 
US-08-627-151A-5 



Query Match 7.3%; Score 30.6; DB 2; Length 6253; 

Best Local Similarity 53.8%; Pred. No. 2.3; 

Matches 63; Conservative 0; Mismatches 54; Indels 0; Gaps 0; 

Qy 185 gcagccaaggagatgcaaggccatttgtggacgaagctgtgctgcaagtatctgactggg 244 

I I I I II I I I I I I III I I I I I I I I I I I II III 

Db 57 91 GCATCTTACGGATGGCATGACAGTAAGAGAATTATGCAGTGCTGCCATAACCATGATGCG 57 32 

Qy 245 gtttcagcctatctgacatccaactgcagaagaaagaggctcaaggcttttttgaac 301 

I I I I I I I I I I I I I I III I I I I I I I I I I I I I I I 

Db 57 31 GCCAACTTACTTCTGACAACGATCGGAGGACCGAAGGAGCTAACCGCTTTTTTGCAC 567 5 



RESULT 7 
US-08-318-826A-9/C 

; Sequence 9, Application US/08318826A 
; Patent No. 5891725 
; GENERAL INFORMATION: 
; APPLICANT: Soreq, Hermona 
APPLICANT: Zakut, Haim 



APPLICANT: Eckstein, Fritz 

TITLE OF INVENTION: Synthetic Antisense 

TITLE OF INVENTION: Oligodeoxynucleotides and Pharmaceutical Compositions 
TITLE OF INVENTION: Containing Them 
NUMBER OF SEQUENCES: 9 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: Kohn & Associates 

STREET: 30500 No. 5891725thwestern Hwy., Suite 410 

CITY: Farmington Hills 

STATE: Michigan 

COUNTRY: US 

ZIP: 48334 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 

COMPUTER: IBM PC compatible 

OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Patentln Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/318 , 826A 

FILING DATE: 

CLASSIFICATION: 514 
ATTORNEY/AGENT INFORMATION: 

NAME: Kohn, Kenneth I. 

REGISTRATION NUMBER: 30,955 

REFERENCE/DOCKET NUMBER: 2391.00001 
TELECOMMUNICATION INFORMATION: 

TELEPHONE: (248) 539-5050 

TELEFAX: (248) 539-5055 
INFORMATION FOR SEQ ID NO: 9: 
SEQUENCE CHARACTERISTICS: 

LENGTH: 2381 base pairs 

TYPE: nucleic acid 

STRANDEDNESS : double 

TOPOLOGY: linear 
MOLECULE TYPE: cDNA to mRNA 
HYPOTHETICAL: NO 
ANTI-SENSE: NO 
ORIGINAL SOURCE: 

ORGANISM: Homo sapiens 

DEVELOPMENTAL STAGE: fetal 

TISSUE TYPE: Brain, Liver 
POSITION IN GENOME: 

MAP POSITION: 3q26 
FEATURE : 

NAME/KEY: mat_peptide 

LOCATION: 160.. 1881 

IDENTIFICATION METHOD: experimental 
OTHER INFORMATION 



OTHER INFORMATION 
OTHER INFORMATION 
OTHER INFORMATION 

FEATURE: 

NAME/KEY: sig_peptide 
LOCATION: 7 6.. 159 

FEATURE: 

NAME /KEY: mRNA 
LOCATION: 1..2381 



/EC_number= 3.1.1.8 
/evidence= EXPERIMENTAL 
/gene- "BCHE" 

/note= "butyrylcholinesterase mature peptide" 



FEATURE : 

NAME /KEY : CDS 
LOCATION: 7 6.. 188 4 
US-08-318-826A-9 



Query Match 7.1%; Score 30; DB 2; Length 2381; 

Best Local Similarity 50.5%; Pred. No. 2.2; 

Matches 98; Conservative 0; Mismatches 95; Indels 1; Gaps 

Qy 126 actttactggaacgtcctgtgttcaatgcattctgggaaaggaatgttgcagagtctgtg 185 

I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 14 85 AATTTCATAGCCATGCATCACTCCCATCCATTCTGGCCACGG7VAGTTTGGAGGATCGGTG 14 2 6 

Qy 186 cagccaaggagatgcaaggccatttgtggacgaagctgtgctgcaagtatctgactgggg 245 

II I I I I I II I I I I I I I I I I III 
Db 14 25 TTCAAAATAGTAGAAAAAGGCATTATTTCCCCATTCTGAG-AACTTCTTGGTGAACTCCA 1367 

Qy 24 6 tttcagcctatctgacatccaactgcagaagaaagaggctcaaggcttttttgaactcat 305 

III I I I I I I I III II II I I I I I I I I III I 

Db 1366 AGGCAGGGCATATGAAATTATAATCCCCAACAACATCACCCAAGGCCTCACGGTAGTTTT 1307 

Qy 306 cacgtctctgttca 319 

II I I II I I I III 

Db 1306 CAGGTCTCTGATCA 1293 



RESULT 8 
5215909-13/c 
;Patent No. 5215909 

APPLICANT: SOREQ, HERMONA 

TITLE OF INVENTION: HUMAN CHOLINESTERASE GENES 
NUMBER OF SEQUENCES: 13 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/07/572,911 

FILING DATE: 15-AUG-1990 
PRIOR APPLICATION DATA: 

APPLICATION .NUMBER: 87,724 

FILING DATE: 21-AUG-1987 

APPLICATION NUMBER: 875,737 

FILING DATE: 18-JUN-1986 
;SEQ ID NO:13: 

LENGTH: 24 00 
5215909-13 



Query Match 7.1%; Score 30; DB 6; Length 2400; 

Best Local Similarity 50.5%; Pred. No. 2.2; 

Matches 98; Conservative 0; Mismatches 95; Indels 1; Gaps 

Qy 12 6 actttactggaacgtcctgtgttcaatgcattctgggaaaggaatgttgcagagtctgtg 185 

I I I I II II I I I I I I I I I I I I I I I I I I I I I I II Ml 

Db 14 85 AATTTCATAGCCATGCATCACTCCCATCCATTCTGGCCACGGAAGTTTGGAGGATCGGTG 14 2 6 

Qy 186 cagccaaggagatgcaaggccatttgtggacgaagctgtgctgcaagtatctgactgggg 245 

II I I I I I I I I I I I I I I I I I III 
Db 1425 TTCAAAATAGTAGAAAAAGGCATTATTTCCCCATTCTGAG-AACTTCTTGGTGAACTCCA 1367 



Qy 24 6 tttcagcctatctgacatccaactgcagaagaaagaggctcaaggcttttttgaactcat 305 

III I I I I I I I III I I I I I I I I I I I I III I 

Db 1366 AGGCAGGGCATATGAAATTATAATCCCCAACAACATCACCCAAGGCCTCACGGTAGTTTT 1307 

Qy 306 cacgtctctgttca 319 

II I I I I I I I I II 
Db 1306 CAGGTCTCTGATCA 1293 



RESULT 9 
US-08-318-826A-8/C 

; Sequence 8, Application US/08318826A 
; Patent No. 5891725 
; GENERAL INFORMATION: 

APPLICANT: Soreq, Hermona 

APPLICANT: Zakut, Haim 

APPLICANT: Eckstein, Fritz 

TITLE OF INVENTION: Synthetic Antisense 

TITLE OF INVENTION: Oligodeoxynucleotides and Pharmaceutical Compositions 
TITLE OF INVENTION: Containing Them 
NUMBER OF SEQUENCES: 9 
CORRESPONDENCE ADDRESS: 
; ADDRESSEE: Kohn & Associates 

STREET: 30500 No. 5891725thwestern Hwy., Suite 410 
; CITY: Farmington Hills 

STATE: Michigan 

COUNTRY: US 

ZIP: 48334 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 

COMPUTER: IBM PC compatible 

OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Patentln Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/318 , 826A 

FILING DATE: 

CLASSIFICATION: 514 
ATTORNEY/AGENT INFORMATION: 

NAME: Kohn, Kenneth I. 
; . REGISTRATION NUMBER: 30,955 

REFERENCE/DOCKET NUMBER: 2391.00001 
TELECOMMUNICATION INFORMATION: 

TELEPHONE: (248) 539-5050 

TELEFAX: (248) 539-5055 
; INFORMATION FOR SEQ ID NO: 8: 
SEQUENCE CHARACTERISTICS: 

LENGTH: 2416 base pairs 
; TYPE: nucleic acid 

STRANDEDNESS: double 

TOPOLOGY: linear 
MOLECULE TYPE: cDNA to mRNA 
HYPOTHETICAL: NO 
ANTI-SENSE: NO 
ORIGINAL SOURCE: 

ORGANISM: Homo sapiens 
; TISSUE TYPE: Basal ganglion 



POSITION IN GENOME: 
MAP POSITION: 3q2 6 

FEATURE : 

NAME /KEY : mat_pept ide 
LOCATION: 214 . .1935 



OTHER INFORMATION 
OTHER INFORMATION 
OTHER INFORMATION 
FEATURE: 

NAME /KEY: sig_peptide 
LOCATION: 130. .213 
FEATURE: 

NAME /KEY: mRNA 
LOCATION: . 1 . . 2416 
FEATURE: 

NAME/KEY: CDS 
LOCATION: 130. . 1938 
US-08-318-826A-8 



/EC_number= 3.1.1.8 
/gene- "BCHE" 

/note= "butyrylcholinesterase mature peptide" 



Query Match 7.1%; Score 30; DB 2; Length 2416; 

Best Local Similarity 50.5%; Pred. No. 2.2; 

Matches 98; Conservative 0; Mismatches 95; Indels 1; Gaps 1; 

Qy 12 6 actttact.ggaacgtcctgtgttcaatgcattctgggaaaggaatgttgcagagtctgtg 185 

I I I I I I II I I II Ml I I I I I I I I I I I I I I I I I I II 

Db 1539 AATTTCATAGCCATGCATCACTCCCATCCATTCTGGCCACGGAAGTTTGGAGGATCGGTG 14 80 

Qy 186 cagccaaggagatgcaaggccatttgtggacgaagctgtgctgcaagtatctgactgggg 245 

II I I I I I I I I I MINI I I III 
Db 14 7 9 T T C AAAAT AGT AGAAAAAGGC AT TAT T T C CC CAT T CT G AG - AAC T T C T T GGT G AACT C C A 1421 

Qy 246 tttcagcctatctgacatccaactgcagaagaaagaggctcaaggcttttttgaactcat 305 

III I I I I I I I III I I I I I I I II I I I III I 

Db 1420 AGGCAGGGCATATGAAATTATAATCCCCAACAACATCACCCAAGGCCTCACGGTAGTTTT 1361 

Qy 306 cacgtctctgttca 319 

II I I I I I I I III 

Db 1360 CAGGTCTCTGATCA 1347 



RESULT 10 
US-09-334-489-l/c 

; Sequence 1, Application US/09334489 
; Patent No. 6291175 
; GENERAL INFORMATION: 

APPLICANT: Pierre Sevigny 
; APPLICANT: Keith Schappert 

APPLICANT: Heiko Wiesbusch 
; TITLE OF INVENTION: METHODS FOR TREATING A NEUROLOGICAL 
; TITLE OF INVENTION: DISEASE BY DETERMINING BCHE GENOTYPE 

FILE REFERENCE: 08523/013002 
; CURRENT APPLICATION NUMBER: US/09/334,4 8 9 
; CURRENT FILING DATE: 1999-06-16 
; PRIOR APPLICATION NUMBER: 60/089,406 
; PRIOR FILING DATE: 1998-06-18 
; NUMBER OF SEQ ID NOS : 8 



SOFTWARE: FastSEQ for Windows Version 4.0 
; SEQ ID NO 1 

LENGTH: 2416 

TYPE: DNA 
; ORGANISM: Homo sapiens 
US-09-334-489-1 



Query Match 7.1%; Score 30; DB 4; Length 2416; 

Best Local Similarity 50.5%; Pred. No. 2.2; 

Matches 98; Conservative 0; Mismatches 95; Indels 1; Gaps 

Qy 126 actttactggaacgtcctgtgttcaatgcattctgggaaaggaatgttgcagagtctgtg 185 

I 1 I I II II I I II I I I I I I I I I II I I Mill I I I I I 

Db 1539 AATTTCATAGCCATGCATCACTCCCATCCATTCTGGCCACGGAAGTTTGGAGGATCGGTG 1480 

Qy 18 6 cagccaaggagatgcaaggccatttgtggacgaagctgtgctgcaagtatctgactgggg 2 45 

II I I I I I I I I I I I II I I I I III 
Db 14 7 9 TTCAAAATAGTAGAAAAAGGCATTATTTCCCCATTCTGAG-AACTTCTTGGTGAACTCCA 1421 

Qy 24 6 tttcagcctatctgacatccaactgcagaagaaagaggctcaaggdttttttgaactcat 305 

III I I I I I I I III I I I I I I I I I I I I III I 

Db 1420 AGGCAGGGCATATGAAATTATAATCCCCAACAACATCACCCAAGGCCTCACGGTAGTTTT 1361 

Qy 306 cacgtctctgttca 319 

II I I I I I I I III 

Db 13 60 CAGGTCTCTGATCA 1347 



RESULT 11 
US-09-334-489-2/C 

; Sequence 2, Application US/09334489 

; Patent No. 6291175 

; GENERAL INFORMATION: 

; APPLICANT: Pierre Sevigny 

; APPLICANT: Keith Schappert 

; APPLICANT: Heiko Wiesbusch 

; TITLE OF INVENTION: METHODS FOR TREATING A NEUROLOGICAL 

; TITLE OF INVENTION: DISEASE BY DETERMINING BCHE GENOTYPE 

; FILE REFERENCE: 08523/013002 

; CURRENT APPLICATION NUMBER: US/09/334,489 

; CURRENT FILING DATE: 1999-06-16 

; PRIOR APPLICATION NUMBER: 60/089,406 

PRIOR FILING DATE: 1998-06-18 
; NUMBER OF SEQ ID NOS : 8 

SOFTWARE: FastSEQ for Windows Version 4.0 
; SEQ ID NO 2 

LENGTH: 2416 
TYPE: DNA 

ORGANISM: Homo sapiens 
US-09-334-489-2 



Query Match 7.1%; Score 30; DB 4; Length 2416; 

Best Local Similarity 50.5%; Pred. No. 2.2; 

Matches 98; Conservative 0; Mismatches 95; Indels 1; Gaps 



Qy 126 actttactggaacgtcctgtgttcaatgcattctgggaaaggaatgttgcagagtctgtg 185 

I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1539 AATTTCATAGCCATGCATCACTCCCATCCATTCTGGCCACGGAAGTTTGGAGGATCGGTG 14 80 

Qy 186 cagccaaggagatgcaaggccatttgtggacgaagctgtgctgcaagtatctgactgggg 245 

II I II I I I I I I I I I I I I I I 111 
Db 14 7 9 TTCAAAATAGTAGAAAAAGGCATTATTTCCCCATTCTGAG-AACTTCTTGGTGAACTCCA 14 21 

Qy 24 6 tttcagcctatctgacatccaactgcagaagaaagaggctcaaggcttttttgaactcat 305 

III I I I I I I I III I I I I I II I I I I I III I 

Db 14 20 AGGCAGGGCATATGAAATTATAATCCCC7UVCAACATCACCC7^AGGCCTCACGGTAGTTTT 1361 

Qy 306 cacgtctctgttca 319 

II I I I I I II III 

Db 1360 CAGGTCTCTGATCA 1347 



RESULT 12 
US-08-221-817-12 

; Sequence 12, Application US/08221817 
; Patent No. 5532151 
; GENERAL INFORMATION: 

APPLICANT: Chantry, David 

APPLICANT: Gray, Patrick W. 

APPLICANT: Hoekstra, Merle F. 

TITLE OF INVENTION: A No. 5532151el G Protein-Coupled Receptor 
TITLE OF INVENTION: Kinase GRK6 
NUMBER OF SEQUENCES: 24 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: Marshall, O f Toole, Gerstein, Murray & 

ADDRESSEE: Borun 

STREET: 6300 Sears Tower, 233 South Wacker Drive 

CITY: Chicago 

STATE: Illinois 

COUNTRY: USA 

ZIP: 60606 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 

COMPUTER: IBM PC compatible 

OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Patentln Release #1.0, Version #1.25 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/221,817 

FILING DATE: 

CLASSIFICATION: 435 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: 08/123,932 

FILING DATE: 17 SEP 1993 
ATTORNEY/AGENT INFORMATION: 

NAME: No. 5532151and, Greta E. 

REGISTRATION NUMBER: 35,302 

REFERENCE/DOCKET NUMBER: 31981 
TELECOMMUNICATION INFORMATION: 

TELEPHONE: (312) 474-6300 

TELEFAX: (312) 474-0448 

TELEX: 25-3856 
; INFORMATION FOR SEQ ID NO: 12: 



SEQUENCE CHARACTERISTICS: 
LENGTH: 2204 base pairs 
; TYPE: nucleic acid 

STRANDEDNESS : single 
TOPOLOGY: linear 
MOLECULE TYPE: cDNA 
FEATURE: 

NAME/KEY: CDS 
LOCATION: 31.. 1758 
US-08-221-817-12 



Query Match 6.9%; Score 29; DB 1; Length 2204; 

Best Local Similarity 61.0%; Pred. No. 4.7; 

Matches 47; Conservative 0; Mismatches 30; Indels 0; Gaps 0; 

Qy 122 cagaactttactggaacgtcctgtgttcaatgcattctgggaaaggaatgttgcagagtc 181 

I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1634 CAGACCTGGACTGGAAGGGCCAGCCACCTGCACCTCCTAAAAAGGGACTGCTGCAGAGAC 1693 

Qy 182 tgtgcagccaaggagat 198 

I I I I I I I I I I 

Db 1694 TCTTCAGTCGCCAAGAT 1710 



RESULT 13 
US-08-454-439-12 

; Sequence 12, Application US/08454439 
; Patent No. 5591618 
; GENERAL INFORMATION: 

APPLICANT: Chantry, David 

APPLICANT: Gray, Patrick W. 

APPLICANT: Hoekstra, Merle F. 

TITLE OF INVENTION: A No. 5591618el G Protein-Coupled Receptor 
TITLE OF INVENTION: Kinase GRK6 
NUMBER OF SEQUENCES: 2 4 
CORRESPONDENCE ADDRESS: 
; ADDRESSEE: Marshall, O'Toole, Gerstein, Murray & 

; ADDRESSEE: Borun 

STREET: 6300 Sears Tower, 233 South Wacker Drive 

CITY: Chicago 

STATE: Illinois 

COUNTRY: USA 

ZIP : 60606 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 

COMPUTER: IBM PC compatible 

OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Patentln Release #1.0, Version #1.25 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/4 54,4 39 

FILING DATE: 30-MAY-1995 

CLASSIFICATION: 4 35 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 08/221,817 

FILING DATE: 31-MAR-1994 

APPLICATION NUMBER: 08/123,932 



FILING DATE: 17 SEP 1993 

CLASSIFICATION: 435 
ATTORNEY/AGENT INFORMATION: 

NAME: No. 5591618and, Greta E. 

REGISTRATION NUMBER: 35,302 

REFERENCE/DOCKET NUMBER: 31981 
TELECOMMUNICATION INFORMATION-: 

TELEPHONE: (312) 4 7 4-6300 

TELEFAX: (312) 474-0448 

TELEX: 25-3856 
; INFORMATION FOR SEQ ID NO: 12: 
SEQUENCE CHARACTERISTICS: 

LENGTH: 2204 base pairs 

TYPE: nucleic acid 

STRANDEDNESS: single 

TOPOLOGY: linear 
MOLECULE TYPE: cDNA 
FEATURE: 

NAME/KEY: CDS 

LOCATION: 31.. 1758 
US-08-454-439-12 



Query Match 6,9%; Score 29; DB 1; Length 2204; 

Best Local Similarity 61.0%; Pred. No. 4.7; 

Matches 47; Conservative 0; Mismatches 30; Indels 0; Gaps 

Qy 122 cagaactttactggaacgtcctgtgttcaatgcattctgggaaaggaatgttgcagagtc 181 

I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I 
Db 1634 CAGACCTGGACTGGAAGGGCCAGCCACCTGCACCTCCTAAAAAGGGACTGCTGCAGAGAC 1693 

Qy 182 tgtgcagccaaggagat 198 

I I I I I I I I I I 
Db 1694 TCTTCAGTCGCCAAGAT 1710 



RESULT 14 
PCT-US94-10487-12 

; Sequence 12, Application PC/TUS9410487 
; GENERAL INFORMATION: 

APPLICANT: ICOS Corporation 

TITLE OF INVENTION: A Novel G Protein-Coupled Receptor 
TITLE OF INVENTION: Kinase GRK6 
NUMBER OF SEQUENCES: 24 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: Marshall, O'Toole, Gerstein, Murray & 
; ADDRESSEE : Borun 

STREET: 6300 Sears Tower, 233 South Wacker Drive 
; CITY: Chicago 

; STATE: Illinois 

COUNTRY: USA 

ZIP: 60606 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 

COMPUTER: IBM PC compatible 

OPERATING SYSTEM: PC-DOS /MS-DOS 

SOFTWARE: Patentln Release #1.0, Version #1.25 



CURRENT APPLICATION DATA: 

APPLICATION NUMBER: PCT/US94 /l 04 87 

FILING DATE: 

CLASSIFICATION: 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: 08/221,817 

FILING DATE: 31 MAR 1994 

CLASSIFICATION: 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: 08/123,932 

FILING DATE: 17 SEP 1993 

CLASSIFICATION: 
ATTORNEY/AGENT INFORMATION: 

NAME: Noland, Greta E. 

REGISTRATION NUMBER: 35,302 

REFERENCE/DOCKET NUMBER: 27866/31981 
TELECOMMUNICATION INFORMATION: 

TELEPHONE: (312) 474-6300 

TELEFAX: (312) 474-0448 

TELEX: 25-3856 
INFORMATION FOR SEQ ID NO: 12: 
SEQUENCE CHARACTERISTICS: 

LENGTH: 2204 base pairs 
; TYPE: nucleic acid 

STRANDEDNESS: single 

TOPOLOGY: linear 
MOLECULE TYPE: cDNA 
FEATURE: 

NAME/KEY: CDS 

LOCATION: 31.. 1758 
PCT-US94-10487-12 



Query Match 6.9%; Score 29; DB 5; Length 2204; 

Best Local Similarity 61.0%; Pred. No. 4.7; 

Matches 47; Conservative 0; Mismatches 30; Indels 0; Gaps 

Qy 122 cagaactttactggaacgtcctgtgttcaatgcattctgggaaaggaatgttgcagagtc 181 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1634 CAGACCTGGACTGGAAGGGCCAGCCACCTGCACCTCCTAAAAAGGGACTGCTGCAGAGAC 1693 

Qy 182 tgtgcagccaaggagat 198 

I I I I I I I I I I 
Db 1694 TCTTCAGTCGCCAAGAT 1710 



RESULT 15 
US-08-221-817-10 

; Sequence 10, Application US/08221817 
; Patent No. 5532151 
; GENERAL INFORMATION: 

APPLICANT: Chantry, David 

APPLICANT: Gray, Patrick W. 

APPLICANT: Hoekstra, Merle F. 

TITLE OF INVENTION: A No. 5532151el G Protein-Coupled Receptor 
TITLE OF INVENTION: Kinase GRK6 
NUMBER OF SEQUENCES: 24 



CORRESPONDENCE ADDRESS: 

ADDRESSEE : Marshall, O'Toole, Gerstein, Murray & 
; ADDRESSEE: Borun 

STREET: 6300 Sears Tower, 233 South Wacker Drive 
; CITY: Chicago 

; STATE: Illinois 

COUNTRY: USA 
ZIP: 60606 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 
SOFTWARE: Patentln Release #1.0, Version #1.25 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/221,817 
FILING DATE: 
CLASSIFICATION: 435 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: 08/123,932 
FILING DATE: 17 SEP 1993 
ATTORNEY/AGENT INFORMATION: 

NAME: No. 5532151and, Greta E. 
REGISTRATION NUMBER: 35,302 
REFERENCE/DOCKET NUMBER: 31981 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: (312) '474-6300 
TELEFAX: (312) 474-0448 
TELEX: 25-3856 
; INFORMATION FOR SEQ ID NO: 10: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 2206 base pairs 
; TYPE: nucleic acid 

STRANDEDNESS : single 
TOPOLOGY: linear 
MOLECULE TYPE: cDNA 
FEATURE: 

NAME /KEY: CDS 
LOCATION: 31.. 192 6 
US-08-221-817-10 



Query Match 6.9%; Score 29; DB 1; Length 2206; 

Best Local Similarity 61.0%; Pred. No. 4.7; 

Matches 47; Conservative 0; Mismatches 30; Indels 0; Gaps 

Qy 122 cagaactttactggaacgtcctgtgttcaatgcattctgggaaaggaatgttgcagagtc 181 

.111111 I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I 
Db 1637 CAGACCTGGACTGGAAGGGCCAGCCACCTGCACCTCCTAAAAAGGGACTGCTGCAGAGAC 169 

Qy 182 tgtgcagccaaggagat 198 

I I I I I I I I I I 

Db 1697 TCTTCAGTCGCCAAGAT '1713 



Search completed: February 7, 2002, 10:51:40 
Job time: 6066 sec 



GenCore version 4.5 
Copyright (c) 1993 - 2000 Compugen Ltd. 



OM nucleic - nucleic search, using sw model 



Run on: February 7, 2002, 08:20:37 ; Search time 4942.22 Seconds 

(without alignments) 
915.373 Million cell updates/sec 

Title: US-09-394-74 5-5950 

Perfect score: 421 

Sequence : 1 gggtccaggcacgcgtccga agtggcagaatttgtgccgc 421 



Scoring table: I DENT IT Y_NUC 

Gapop 10,0 , Gapext 1.0 

Searched: 11351937 seqs, 5372889281 residues 



Total number of hits satisfying chosen parameters: 22703874 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 



Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 



Database : EST:* 

1: em_estfun:* 

2: em_esthum:* 

3: em__estin:* 

4 : em_estom: * 

5: em_estpl:* 

6: em__estba:* 

7: em_estro:* 

8: em_estov:* 

9: em_htc:* 
10: gb_estl:* 

11: gb_est2:* 

12: gb__htc:* 

13: gb_gss:* 

14: em_gss_fun:* 

15: em_gss_hum:* 

16: em_gss_inv:* 

17: em_gss_pln:* 

18: em_gss_pro:* 

19: em_gss_rod:* 

20: em_gss_vrt:* 

21: em_gss_other : * 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



SUMMARIES 



Result 
No. 
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Score Match Length DB ID 
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8 


615 


13 


AQ918240 


AQ918240 


RPCI-23-2 




41 


32 


.8 


7. 


8 


735 


13 


AZ374574 


AZ374574 


1M0127K24 




42 


32 


.8 


7. 


8. 


1025 


11 


BG872410 


BG872410 


602792712 


c 


43 


32 


. 6 


7. 


7 


432 


13 


AQ584680 


AQ584680 


RPCI-11-4 


c 


44 


32 


.6 


7 . 


7 


573 


10 


BE204987 


BE204987 


EST397663 




45 


32 


.6 


7. 


7 


702 


10 


AL135286 


AL135286 


DKFZp762D 



ALIGNMENTS 



RESULT 1 
AQ330565 

LOCUS AQ330565 



57 8 bp 



DNA 



GSS 08-JAN-1999 



DEFINITION nbxb0047D17f CUGI Rice BAC Library Oryza sativa genomic clone 

nbxb0047D17f , DNA sequence. 
ACCESSION AQ330565 
VERSION AQ330565.1 GI:4122415 

KEYWORDS GSS . 
SOURCE Oryza sativa. 

ORGANISM Oryza sativa 

Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; 

Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae; 

Ehrhartoideae; Oryzeae; Oryza. 
REFERENCE 1 (bases 1 to 578) 
AUTHORS Wing, R. A. and Dean, R. A. 

TITLE A BAC End Sequencing Framework to Sequence the Rice Genome 

JOURNAL Unpublished (1998) 
COMMENT Contact: Wing RA 

Clemson University Genomics Institute 

Clemson University 

100 Jordan Hall, Clemson, SC 29634, USA 

Tel: 864 656 7288 

Fax: 864 656 4293 

Email : rwing@clemson . edu 

Seq primer: TAATACGACTCACTATAGGG 

Class: BAC ends 

High quality sequence stop: 292. 
FEATURES Location/Qualifiers 
source 1. .578 

/organism="Oryza sativa" 
/strain=" Japonica" 
/cultivar="Nipponbare" 
/db_xref="taxon: 4530" 
/clone="nbxb004 7D17f " 
/clone_lib="CUGI Rice BAC Library" 
/tissue_type="Leaf " 
/lab_host="E. coli DH10B" 

/note="Vector : pBeloBACll; Site_l: Hindlll; Site_2: 
Hindlll; Rice is one of two most popular grains in the 
world. Half of the world population especially those 
inhabiting highly populated areas of the humid tropics 
and subtropics, rely on rice as their primary source of 
carbohydrate. Monocotyledonous rice is a diploid plant 
(2n=24) with a haploid genome equivalent of 431 Mbp - 
(Arumuganathan and Earle, 1991) . The relatively small 
genome of rice, three times larger than that of 
Arabidopsis, makes it suitable for genomic studies. In 
order to facilitate positional cloning, physical mapping 
and genome sequencing of rice, we have constructed a BAC 
library from Oryza sativa, Nipponbare variety. The 
library contains 36,864 clones with an average insert size 
of 128.5 Kb providing 10.9 haploid genome equivalents. The 
deep coverage allows the isolation a particular sequence 
with a probability of 99.9 %. Two high density filters, 
each containing 18,432 clones (doubly spotted), represent 
the whole library for colony screening." 

BASE COUNT 152 a 114 c 111 g 201 t 

ORIGIN 



Query Match 46.0%; Score 193.6; DB 13; Length 578; 

Best Local Similarity 81.6%; Pred. No. 3.8e-47; 

Matches 248; Conservative 0; Mismatches 54; Indels 2; Gaps 



2; 



Qy 

Db 

Qy 

Db 

Qy 
Db 

Qy 
Db 

Qy 

Db 

Qy 

Db 



71 



25 



130 



85 



190 



aacatttgccttttcgcgttccaattactaatgttacg-gcattattcaggacagaactt 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I 
AGCATTGACCCTTCATCATGCCAAATAATAATCTTATGTGCATTACTCAGGACAAAACTT 

tactggaacgtcctgtgttcaatgcattctgggaaaggaatgttgcagagtctgtgcagc 

I I I I I I I I I I II I I II I I I I I I I I I I I I I! I I I I I I I I I I I I I I I I I I I I I II 
CACTGGAAAGTCCTATGTTCAATGCATTTTGGGAAAAGGATGTTGCAGAGTCTGTGC-GC 



caaggagatgcaaggccatttgtggacgaagctgtgctgcaagtatctgactggggtttc 
I I I I I I I II I I I III I I I I I II I I I I I I II I I I I I I I I I I I II I I I I I III 
14 4 CAAGGAGATGCACAGCCTTTTGTAGAGGAAGCTGTACTGCAAGTATCAGATTGGGGATTC 

250 agcctatctgacatccaactgcagaagaaagaggctcaaggcttttttgaactcatcacg 

III I II Mill II I I I I I I I I M I I I I I II I I I I I I I I I I I I I I I I 
204 AGCTTGTCAGACATTCAAATGCAGAAGAGAGAGGATCTGAGCTTTTTTGAATTGATCAAA 

310 tctctgttcaatcatgctgaaaaacagtgggtgggatttctgggcccaatacatatatcg 
I I I I I III III I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I 
2 64 TCTCTATTCCGTCAGGCTGAACGGGAGTGGGTGGGATTTCTGGGCCCAATACACATATGG 

370 cagg 373 

I I I I 
324 CAGG 327 



129 



84 



189 



143 



249 



203 



309 



263 



369 



323 



RESULT 2 

AW448782 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 



FEATURES 

source 



AW448782 742 bp mRNA EST 03-JAN-2001 

BRY_1421 BRY Triticum aestivum cDNA clone P35-10, mRNA sequence. 
AW448782 

AW448782.1 GI:12019317 
EST. 

bread wheat . 
Triticum aestivum 

Eukaryota; Viridiplantae ; Streptophyta; Embryophyta; Tracheophyta; 
Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae; Pooideae 
; Triticeae; Triticum. 
1 (bases 1 to 742) 

Clarke, B. C . , Hobbs,M. and Appels,R. 

Genes active in developing wheat endosperm 

Unpublished (2000) 

Contact: Bryan Clarke 

Division of Plant Industry 

C . S . I . R . 0 . 

GPO Box 1600, Canberra, ACT , Australia 
Tel: 61 2 6246 5054 
Fax: 61 2 6246 5000 
Email: bryanc@pi.csiro.au. 

Location/Qualif iers 

1. .742 

/organism="Triticum aestivum" 

/cultivar="Wyuna" 

/db xref="taxon:4565" 



/clone="P35-10" 

/clone_lib="BRY" 

/eel l_type= "endosperm" 
BASE COUNT 180 a 172 c 218 g 169 t 3 others 

ORIGIN 



Query Match 34.8%; Score 146.6; DB 10; Length 742; 

Best Local Similarity 76.8%; Pred. No. 4.1e-33; 

Matches 179; Conservative 0; Mismatches 54; Indels 0; Gaps 0; 

Qy 188 gccaaggagatgcaaggccatttgtggacgaagctgtgctgcaagtatctgactggggtt 247 

111 I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I I 
Db 3 GCCCGGGAGATGCGCAACCATTTGTAGAGGAAGCTGTGCTGCACGTATCTGATTGGGGAT 62 

Qy 248 tcagcctatctgacatccaactgcagaagaaagaggctcaaggcttttttgaactcatca 307 

I I I I I I I I M I I II 1 I I I I I I I I I I I I I I I II I I 

Db 63 T C AGT T T GT C AG AC AT T C AC AT G C AG AAG AAAG AGG AT C AG GG AGT AT T T G AAT T T AT C A 122 

Qy 308 cgtctctgttcaatcatgctgaaaaacagtgggtgggatttctgggcccaatacatatat 367 

I I I I I I I III I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II II I 
Db 123 AGTCTCTGATCAGTCAGGCTGAACGAGAGTGGGTGGGATTTCTGGTCCCAATCCACATCT 182 

Qy 368 cgcaggggatagatgaccgagtgatctcgccctcagtggcagaatttgtgccg 4 20 

I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II 
Db 183 GGTAGGGAATGGATGACCGGGTGGTGCCCCCATCGGCGACCGAGTTTGCCCGG 235 



RESULT 3 

BG238542 

LOCUS 

DEFINITION 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
COMMENT 



BG238542 462 bp mRNA EST 13-FEB-2001 

sab48e05.yl Gm-cl043' Glycine max cDNA clone GENOME SYSTEMS CLONE 
ID: Gm-cl043-2529 5' similar to TR:Q9SF34 Q9SF34 F11F8.28 PROTEIN. 
;, mRNA sequence. 
BG238542 

BG238542.1 GI:12773615 

EST. - ■ 

soybean . 

Glycine max 

Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta ; 
Spermatophyta; Magnoliophyta; eudicotyledons ; core eudicots; 
Rosidae; eurosids I; Fabales; Fabaceae; Papilionoideae ; Phaseoleae 
Glycine . 

1 (bases 1 to 462) 

Shoemaker, R. , Keim,P., Vodkin,L., Erpelding, J . , 
.,A., Bolla f B., Marra,M., Hillier,L., Kucaba,T., 
Wylie,T., Underwood, K. , Steptoe,M., Theising,B., 
,Y., Person, B., Swaller,T., Gibbons, M., Pape,D., Harvey, N., Schurk 
,R., Ritter,E., Kohn,S., Shin,T., Jackson, Y., Cardenas, M., McCann 
,R., Waterston,R. and Wilson, R. 
Public Soybean EST Project 
Unpublished (1999) 

Contact: Shoemaker R/Public Soybean EST Project 

Public Soybean EST Project 

Washington University School of Medicine 

4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108, USA 
Tel: 314 286 1800 



Coryell, V. , Khanna 
Martin, J. , Beck, C . , 
Allen, M . , Bowers 



FEATURES 

source 



BASE COUNT 
ORIGIN 



Fax: 314 286 1810 

Email: est@watson.wustl.edu 

This clone is available through: Genome Systems, Inc. 4633 World 
Parkway Circle St. Louis, Missouri 63134 For further information 
call: (800) 430-0030 or (314) 427-3222 FAX: (888) 919-3324 or (314) 
427-3324 or contact: clones@genomesystems.com or 
info@genomesystems.com web site: www.genomesystems.com 
High quality sequence stop: 397 . 

Location/Qualifiers 

1. .462 

/organism="Glycine max" 
/db_xref="taxon:3847" 

/clones "GENOME SYSTEMS CLONE ID: Gm-cl043-2529" 
/clone_lib-"Gm-cl043" 

/tissue_type="Hypocotyl and Plumule, germinating seeds" 
/lab_host="DH10B" 

/note="Vector : pT7T3Pac (Pharmacia); Site_l: EcoRI; 
Site__2 : NotI; This cDNA library was constructed from mRNA 
isolated from hypocotyl and plumule tissues of seeds 
germinated for three days of the cultivar Williams. 
Complementary DNA was synthesized from mRNA using a primer 
consisting of a poly(dT) sequence with a NotI restriction 
site. EcoRI adapters were ligated to the blunt-ended cDNA 
fragments followed by digestion with EcoRI and NotI. The 
cDNA fragments were directionally cloned into the 
EcoRI-NotI restriction site of the pT7T3-Pac vector. The 
ligated cDNA fragments were transformed into DH10B host 
cells (Gibco BRL) . This library was constructed by Dr. 
Randy Shoemaker . " 
133 a 87 c 115 g 127 t 



Query Match 22.5%; Score 94.8; DB 11; Length 4 62; 

Best Local Similarity 61.6%; Pred. No. le-17; 

Matches 186; Conservative 0; Mismatches 112; Indels 4; Gaps 2; 

Qy 118 aggacagaactttactggaacgtcctgtgttcaatgcattctgggaaaggaatgttgcag 177 

I I I I I I II I I I I III II I I III III I III MM I I 
Db 98 AGGATAAACTTATGATTGAAGAACCAGAATTTGAGGAATTTTGGCAGAGGGATGTGGAGG 157 

Qy 178 agtctgtgcagccaaggagatgcaaggccatttgtggacgaagctgtgctgcaagtatct 237 

I I I I II I I II III I I I I I I I I I I II I I I'l I I I I I I I I I Mill 
Db 158 AGTCAGTTC-GTCAGGGAAACATACGCCCATTTATAGAAGAAGCTGTTCTGCAGGTATCA 216 

Qy 238 gactggggtttcagcctatctgacatccaactgcagaagaa agaggctcaaggcttt 294 

I I II I I I II III II III II II II I I I I I MM! 

Db 217 AATTGGGGTTTTGACCTTAAGGAACTTCATGTGCAAAAGAAGTGTCAAACAAGAGGCATA 27 6 

Qy 2 95 tttgaactcatcacgtctctgttcaatcatgctgaaaaacagtgggtgggatttctgggc 354 

II II II I II II II I I I I I II I I I I I I II I II I 

Db 277 CTTCTTTGGTTGAAATCCATGTACAGTCAGGCGGACTGTGAATTAGCAGGATTTCTTGGC 336 

Qy 355 ccaatacatatatcgcaggggatagatgaccgagtgatctcgccctcagtggcagaattt 414 

I I I I I I I I I I II II I I I I II I I I I I I I I I II I II I II I II I I 
Db 337 CTTACACATATATGGCAGGGACTGGATGATAGGGTGGTCCCACCATCAGTGATGGAATAT 396 



Qy 
Db 



415 gt 416 
I 

397 AT 398 



RESULT 4 
AW030046 

LOCUS AW030046 408 bp mRNA EST 18-MAY-2001 

DEFINITION EST273301 tomato callus, TAMU Lycopersicon esculentum cDNA clone 

cLECl 6N4 , mRNA sequence. 
ACCESSION AW030046 
VERSION AW030046.1 GI:5888802 

KEYWORDS EST. 
SOURCE tomato. 

ORGANISM Lycopersicon esculentum 

Eukaryota; Viridiplantae ; Streptophyta ; Embryophyta; Tracheophyta ; 

Spermatophyta; Magnoliophyta; eudicotyledons ; core eudicots; 

Asteridae; euasterids I; Solanales; Solanaceae; Solanum; 

Lycopersicon . 
REFERENCE 1 (bases 1 to 408) 

AUTHORS Alcala,J., Vrebalov,J., White, R., Matern,A.L., Vision, T., Holt, I.E. 

, Liang, F., Upton, J., Craven, M.B., Bowman, C.L., Ahn,S., Ronning 

,C.M., Fraser, C .M. , Martin, G.B., Tanksley, S . D . and Giovannoni , J . 
TITLE Generation of ESTs from tomato callus tissue 

JOURNAL Unpublished (1999) 
COMMENT Contact: CUGI 

Clemson University Genomics Institute 

Clemson University 

100 Jordan Hall, Clemson, SC 29634, USA 
Email : http: //www. genome . clemson . edu/orders /index . html 
5 prime sequence. 
FEATURES Location/Qualifiers 
source 1. .408 

/organism=" Lycopersicon esculentum" 

/cultivar="TA496" 

/db_xref="taxon: 4081" 

/clone="cLEC16N4" 

/clone_lib="tomato callus, TAMU" 

/tissue_type=" callus" 

/dev_stage="25-40 days old" 

/lab_host="XLl-Blue MRF 1 " 

/note="Vector : pBlueScript SK(-); Site_l: EcoRl; Site_2: 
Xhol; supplier: Giovannoni laboratory; cLEC - Cotyledons 
of seedlings 7-10 days post-germination were excised, cut 
at both ends and placed on MS medium with no selection. 
Mixed callus was harvested at 25 and 40 days and included 
undifferentiated masses. Tomato' Callus EST Library" 

BASE COUNT 122 a 85 c 87 g 114 t 

ORIGIN 



Query Match 17.5%; Score 73.8; DB 10; Length 408; 

Best Local Similarity 59.9%; Pred. No. 1.8e-ll; 

Matches 142; Conservative 0; Mismatches 92; Indels 3; Gaps 1; 

Qy 188 gccaaggagatgcaaggccatttgtggacgaagctgtgctgcaagtatctgactggggtt 247 
I I! I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I 



Db 


5 


GACAAAAGAATGCAAAACCATTTGTAGAGGAAGCTGTCTTACAGGTTTCCAATTGGGGAT 


64 


Qy 


248 


tcagcctatctgacatccaactgcagaagaaagaggctc aaggcttttttgaactca 

1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II III III 1 1 
TTAGTCCTGCAGACCTCAAAGTACAGAGGACACGCACTGGGAAGGGTATTATGCATTGGA 


304 


Db 


65 


124 


Qy 


305 


tcacgtctctgttcaatcatgctgaaaaacagtgggtgggatttctgggcccaatacata 
II 1 1 1 1 1 1 1 III III II II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
TTAAATCTCTATTTGGTCAAACAGACGAAATCTTGACTGGATTCCTTGGTCAAATACATG 


364 


Db 


125 


184 


Qy 


365 


tatcgcaggggatagatgaccgagtgatctcgccctcagtggcagaatttgtgccgc 4 21 
111 1 1 i 1 1 1 II II 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
TATGGCAGGGAATGGAAGATATGGTGGTACCGCCATCCACAAGTGATTTCTTGCAGC 2 41 




Db 


185 





RESULT 5 
BI271227 

LOCUS BI271227 658 bp mRNA EST 18-JUL-2001 

DEFINITION NF051E08FL1F1067 Developing flower Medicago truncatula cDNA clone 

NF051E08FL 5 ! , mRNA sequence. 
ACCESSION BI271227 

VERSION BI271227.1 GI:14879518 

KEYWORDS EST. 

SOURCE barrel medic. 

ORGANISM Medicago truncatula 

Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta ; 
Spermatophyta; Magnoliophyta; eudicotyledons ; core eudicots ; 
Rosidae; euros ids I; Fabales; Fabaceae; Papilionoideae ; Trifolieae; 
Medicago . 
REFERENCE 1 (bases 1 to 658) 

AUTHORS Torres-Jerez, I . , Scott, A. D., Harris, A. R., Gonzales, R. A. , Bell, C. J., 

Flores,H.R., Inman, J. T. , Weller,J.W. andMay,G.D. 
TITLE Expressed Sequence Tags from the Samuel Roberts Noble Foundation 

Medicago truncatula flower library 
JOURNAL Unpubl i shed (2001) 
COMMENT Contact: May GD 

Plant Biology Division 
The Samuel Roberts Noble Foundation 
2510 Sam Noble Parkway, Ardmore, OK 73402, USA 
Tel: 580 221 7391 
Fax: 580 221 7380 
Email : gdmay@ noble . org 
Insert Length: 658 Std Error: 0.00 
Plate: 051 row: E column: 08 
Seq primer: TCACACAGGAAACAGCTATGAC . 
FEATURES Location/Qualifiers 
source 1 . . 658 

/organism="Medicago truncatula" 
/db_xref="taxon:3880" 
/clone="NF051E08FL" 
/clone_lib=" Developing flower" 
/tissue_type=" Developing flowers" 

/dev_stage="Developmentally pooled. Contains a mixture of 
very young, developing, fully-opened flowers and flowers 
in early transition into pods." 

/note="Vector : Lambda Zap; cDNA was prepared from polyA+ 
enriched, pooled samples of equivalent amounts of total 



RNA from very young, developing, fully-opened flowers and 
flowers transitioning into pods. The cDNA was 
directionally ligated into the Uni-Zap XR vector 
(Stratagene) and packaged using the Gigapack III Gold 
packaging extracts. Phagemids containing cDNA inserts were 
in vivo excised from the recombinant Uni-ZAP XR vector 
using ExAssist helper phage and the E. coli strain 
XLl-Blue MRF 1 (Stratagene) . Excised plasmids were plated 
using SOLR cells." 

BASE COUNT 191 a 122 c 160 g 183 t 2 others 

ORIGIN 



Query Match 15.1%; Score 63.6; DB 11; Length 658; 

Best Local Similarity 58.3%; Pred. No. 2.2e-08; 

Matches 147; Conservative 0; Mismatches 101; Indels 4; Gaps 2; 

Qy 118 aggacagaactttactggaacgtcctgtgttcaatgcattctgggaaaggaatgttgcag 177 

I I I I I I I I 111 III III I l' I I I I I I I I I I I I I 
Db 4 08 AGGATGAAATTCTCGTCGATGAACCAGCATTCGAAGAGTATTGGCAGAGGGATCTGGAGG 4 67 

Qy 17 8 agtctgtgcagccaaggagatgcaaggccatttgtggacgaagctgtgctgcaagtatct 237 

I I I I I I I I I II III I I III III I II I I I I I I I II I I I I I I I I I 
Db 4 68 AGTCTGTTC-GGCAGGGAAACCTGAAGCCGTTTATAGAGGAAGCTCTTCTGCAGGTATCT 52 6 

Qy 238 gactggggtttcagcctatctgacatccaactgcagaagaa agaggctcaaggcttt 294 

I I I I I I I I I I I I II III I I I I I I I I I I I I I I I I 

Db 527 AGATGGGATTTCAACATANAAGAACTTCATGTGCATAAGAAGTGTCAAACAGGAGGATTA 58 6 

Qy 295 tttgaactcatcacgtctctgttcaatcatgctgaaaaacagtgggtgggatttctgggc 354 

II II II 1 I I I I I I I I I I I II I I I I I I I I I I 

Db 587 CTTCTTTGGTTGAAATCCATGTACGGTCAGGCAGAATGTGAATTANCAGGATATCTCGGC 64 6 

Qy 355 ccaatacatata 366 

I MINIM- 
Db 64 7 CGT AT AC AC AT A 658 



RESULT 6 

BG606545 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 

TITLE 

JOURNAL 



BG606545 595 bp mRNA EST 17-APR-2001 

WHE2957__G02_N03ZS Wheat dormant embryo cDNA library Triticum 
aestivum cDNA clone WHE2957_G02_N03, mRNA sequence. 
BG606545 

BG606545.1 GI:13656528 
EST. 

bread wheat. 
Triticum aestivum 

Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta ; 
Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae; Pooideae 
; Triticeae; Triticum. 
1 (bases 1 to 595) 

Anderson, 0. D. , Chao,S., Chin, A., Close, T. J., Doherty,L., Fenton 
,R.D., Lazo,G.R., Rausch,C.J., Walker-Simmons , M . K . and Wilson, C. 
The structure and function of the expressed portion of the wheat 
genomes - Dormant embryo cDNA library 
Unpublished (2001) 



COMMENT 



FEATURES 

source 



BASE COUNT 
ORIGIN 



Contact: Olin Anderson 

US Department of Agriculture, Agriculture Research Service, Pacific 

West Area, Western Regional Research Center 

800 Buchanan Street, Albany, CA 94710, USA 

Tel: 5105595773 

Fax: 5105595818 

Email: oandersn@pw.usda.gov 

Sequence have been trimmed to remove vector sequence and low 
quality sequence with phred score less than 20 
Seq primer: Stratagene SK primer. 

Location/Qualifiers 

1. .595 

/organism="Triticum aestivum" 
/cultivar="Brevor " 
/db_xref="taxon : 4565" 
/clone="WHE2 957_G02_N03" 

/clone_lib="Wheat dormant embryo cDNA library" 
/tissue_type="Seed embryo" 
/dev_stage= "Mature seed" 
/lab_host="E. coli SOLR" 

/note="Vector : Lambda Uni-ZAP XR, excised phagemid; 
Site_l: EcoRI; Site_2: Xhol; Plants were grown to seed 
maturity under conditions favoring seed dormancy (L. 
Dohery at K. Walker_Simmons lab, Washington State 
University, Pullman, WA) . Embryos were cut from mature 
dormant seed (Doherty) . Total RNA was prepared from these 
embryos, polyA was purified, a cDNA library was made, and 
the cDNA clones were in vivo excised to give pBluescript 
phagemids in the TJ Close lab at the University of 
California, Riverside (Chin, Fenton) . Plasmid DNA 
preparations and DNA sequencing were performed in the 0D 
Anderson lab (all other authors)." 
149 a 131 c 148 g 166 t 1 others 



Query Match 13.3%; Score 56.2; DB 11; Length 595; 

Best Local Similarity 88.4%; Pred. No. 3.5e-06; 

Matches 61; Conservative 0; Mismatches 8; Indels 0; Gaps 0; 

Qy 118 aggacagaactttactggaacgtcctgtgttcaatgcattctgggaaaggaatgttgcag 177 

I I I I I I III I I I I I I I I I I I II I I I II II I I I I I I I I I I I I I I I I I I I I I I I 
Db 525 AGGACAAAACCTTACTGGAAGCTCCTATGTTCAACGCATTCTGGGAAAAGGATGTTGCAG 584 

Qy 178 agtctgtgc 186 

I I I I I I I I I 
Db 585 AGTCTGTGC 593 



RESULT 7 

BI271547 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 



BI271547 643 bp mRNA EST 18-JUL-2001 

NF057E12FL1F1099 Developing flower Medicago truncatula cDNA clone 
NF057E12FL 5 f , mRNA sequence. 
BI271547 

BI271547.1 GI:14880151 
EST. 



SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 

TITLE 

JOURNAL 
COMMENT 



FEATURES 

source 



BASE COUNT 
ORIGIN 



barrel medic. 
Medicago truncatula 

Eukaryota; Viridiplantae ; Streptophyta; Embryophyta ; Tracheophyta ; 
Spermatophyta; Magnoliophyta; eudicotyledons ; core eudicots; 
Rosidae; eurosids I; Fabales; Fabaceae; Papilionoideae; Trifolieae; 
Medicago . 

1 (bases 1 to 643) 

Torres-Jerez, I . , Scott, A. D., Harris, A. R., Gonzales , R. A. , Bell, C. J., 

Flores,H.R., Inman, J. T . , Weller,J.W. and May,G.D. 

Expressed Sequence Tags from the Samuel Roberts Noble Foundation 

Medicago truncatula flower library 

Unpublished (2001) 

Contact: May GD 

Plant Biology Division 

The Samuel Roberts Noble Foundation 

2510 Sam Noble Parkway, Ardmore, OK 73402, USA 

Tel: 580 221 7391 

Fax: 580 221 7380 

Email : gdmay@noble . org 

Insert Length: 643 Std Error: 0.00 

Plate: 057 row: E column: 12 

Seq primer: TCACACAGGAAACAGCTATGAC . 

Location/Qualifiers 

1. .643 

/organism= "Medicago truncatula" 
/db_xref="taxon:3880" 
/clone="NF057E12FL" 
/clone_lib=" Developing flower" 
/t is sue_type=" Developing flowers" 

/dev_stage="Developmentally pooled. Contains a mixture of 
very young, developing, fully-opened flowers and flowers 
in early transition into pods." 

/note="Vector : Lambda Zap; cDNA was prepared from polyA+ 
enriched, pooled samples of equivalent amounts of total 
RNA from very young, developing, fully-opened flowers and 
flowers transitioning into pods. The cDNA was 
directionally ligated into the Uni-Zap XR vector 
(Stratagene) and packaged using the Gigapack III Gold 
packaging extracts. Phagemids containing cDNA inserts were 
in vivo excised from the recombinant Uni-ZAP XR vector 
using ExAssist helper phage and the E. coli strain 
XLl-Blue MRF ' (Stratagene) . Excised plasmids were plated 
using SOLR cells . " 
186 a 118 c 158 g 180 t 1 others 



Query Match 13.0%; Score 54.6; DB 11; Length 64 3; 

Best Local Similarity 62.7%; Pred. No. l.le-05; 

Matches 101; Conservative 0; Mismatches 59; Indels 1; Gaps 1; 

Qy 118 aggacagaactttactggaacgtcctgtgttcaatgcattctgggaaaggaatgttgcag 17 7 

I I I I I I I I III III I I I I I I I I I I I I I I I I I I 
Db 408 AGGATGAAATTCTCGTCGATGAACCAGCATTCGAAGAGTATTGGCAGAGGGATCTGGAGG 4 67 

Qy 178 agtctgtgcagccaaggagatgcaaggccatttgtggacgaagctgtgctgcaagtatct 237 
I I I I I I I I I II III I I III III I II I I I I I I I Mill I I I I I I 



Db 4 68 AGTCTGTTC-GGCAGGGAAACCTGAAGCCGTTTATAGAGGAAGCTCTTCTGCAGGTATCT 52 6 



Qy 238 gactggggtttcagcctatctgacatccaactgcagaagaa 278 

I I I I I I II I I I I II III I I I I I I I I I 
Db 527 AGAT GGG AT T T C AAC AT AGAAG AAC T T CAT GT GC AT AAG AA 567 



RESULT 8 
BG321058/C 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 

TITLE 
JOURNAL 
COMMENT 



FEATURES 

source 



BASE COUNT 
ORIGIN 



BG321058 595 bp mRNA EST 27-FEB-2001 

Zm04_01al2_A Zm04_AAFC_ECORC_cold_stressed_mai ze_seedlings Zea mays 
cDNA clone Zm04_01al*2 , mRNA sequence. 
BG321058 

BG321058.1 GI:13150736 
EST. 

Zea mays. 
Zea mays 

Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; 
Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae; PACC 
clade; Panicoideae; Andropogoneae; Zea. 
1 (bases 1 to 595) 

Singh, J. A., Wakui,K., Couroux,P., De Moors, A., Harris, L. J., Hattori 
,J.I., Ouellet,T., Robert, L.S., Sprott,D. and Tinker, N. A. 
Expressed Sequence Tags from Cold-Stressed Maize Seedlings 
Unpublished (2001) 
Contact : Singh, J. A. 

Eastern Cereal and Oilseed Research Centre 
Agriculture and Agri-food Canada 

960 Carling' Avenue, Bldg. 20, Ottawa, Ontario, K1A 0C6, Canada 
Tel: (613) 759-1662 
Fax: (613) 759-1701 
Email: singh ja@em. agr . ca . 

Location/Qualifiers 

1*. .595 

/organism=" Zea mays" 
/cultivar-"C0328" 
/db_xref="taxon: 4577" 
/clone=" Zm04_01al2" 

/clone_lib="Zm04_AAFC_ECORC_cold_stressed_maize_seedlings" 
/t issue_type="Leaf , crown" 

/note-"Vector : Bluescript SK-/XhoI-EcoRI ; Site__l: Eco RI; 
Site_2: Xho I; Lower temperature 5o C / hour from 22 to 
12oC; bring to 5o in 1 hour from 12oC. Leave at 5oC 2 days 
, photoperiod 16 hours. Light intensity was 125 uE-1. 
Library prepared by in vivo mass excision from amplified 
library. " 

134 a 149 c 156 g 149 t 7 others 



Query Match 12.9%; Score 54.2; DB 11; Length 595; 

Best Local Similarity 88.1%; Pred. No. 1.4e-05; 

Matches 59; Conservative 0; Mismatches 8; Indels 0; Gaps 0; 

Qy 355 ccaatacatatatcgcaggggatagatgaccgagtgatctcgccctcagtggcagaattt 414 

I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 595 CCAATACATATATGGCAGGGGATGGACGACCGAGTGGTCTCGCCGGCAGTGGCCGAATTT 536 



Qy 415 gtgccgc 421 

I I I I II 
Db 535 GTGCGGC 529 



RESULT 9 
BG507333 

LOCUS BG507333 479 bp mRNA EST 28-MAR-2001 

DEFINITION sac57fll.yl Gm-cl062 Glycine max cDNA clone GENOME SYSTEMS CLONE 

ID: Gm-cl062-4125 5' similar to TR:Q9SF34 Q9SF34 F11F8.28 PROTEIN. 

;, mRNA sequence. 
ACCESSION BG507333 

VERSION . BG507333.1 GI:13477451 
KEYWORDS EST. 
SOURCE soybean. 
ORGANISM Glycine max 

Eukaryota; Viridiplantae ; Streptophyta ; Embryophyta; Tracheophyta ; 

Spermatophyta; Magnoliophyta; eudicotyledons ; core eudicots; 

Rosidae; eurosids I; Fabales; Fabaceae; Papilionoideae; Phaseoleae; 

Glycine . 

REFERENCE 1 (bases 1 to 479) 

AUTHORS Shoemaker, R. , Keim,P., Vodkin,L., Erpelding, J . , Coryell, V., Khanna 
,A., Bolla,B., Marra,M., Hillier,L., Kucaba,T., Martin, J., Beck,C, 
Wylie,T., Underwood, K . , Steptoe,M., Theising,B., Allen,M., Bowers 
,Y., Person, B., Swaller,T., Gibbons, M., Pape,D., Harvey, N., Schurk 
,R., Ritter,E., Kohn,S., Shin,T., Jackson, Y., Cardenas, M., McCann 
,R., Waterston,R. and Wilson,R. 

TITLE Public Soybean EST Project 

JOURNAL Unpublished (1999) 
COMMENT Contact: Shoemaker R/Public Soybean EST Project 

Public Soybean EST Project 
Washington University School of Medicine 

4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108, USA 

Tel: 314 286 1800 

Fax: 314 286 1810 

Email : est@watson . wustl . edu 

This clone is available through: Genome Systems, Inc. 4633 World 
Parkway Circle St. Louis, Missouri 63134 For further information 
call: (800) 430-0030 or (314) 427-3222 FAX: (888) 919-3324 or (314) 
427-3324 or contact: clones@genomesystems.com or 
info@genomesystems.com web site: www.genomesystems.com 
High quality sequence stop: 438. 
FEATURES Location/Qualifiers 
source 1. .479 

/organism="Glycine max" 

/db_xref="taxon:3847" 

/clone=" GENOME SYSTEMS CLONE ID: Gm-cl062-4125" 
/clone_lib="Gm-cl062" 

/tissue_type="stem tissue of greenhouse grown plants" 

/dev_stage="l month old" 

/lab_host="DH10B" 

/note="Vector : pBluescript II SK+; Site_l : EcoRI; Site_2 :. 
Xhol; The cDNA library was constructed from mRNA isolated 
from stem tissue of 1 month old greenhouse grown plants 
for the cultivar Raiden. Complementary DNA was 
synthesized from mRNA using a primer consisting of a 



poly(dT) sequence with a Xhol restriction site. EcoRI 
adapters were ligated to the blunt-ended cDNA fragments 
followed by Xhol digestion. The cDNA fragments were 
directionally cloned into the EcoRI-XhoI restriction site 
of the pBluescript vector. The ligated cDNA fragments were 
transformed into DH10B host cells (GibcoBRL) . This library 
was constructed in the laboratory of Dr. Randy Shoemaker." 

BASE COUNT 153 a 99 c 89 g 137 t 1 others 

ORIGIN 



Query Match 11.3%; Score 47.6; DB 11; Length 479; 

Best Local Similarity 75.6%; Pred. No. 0.0012; 

Matches 59; Conservative 0; Mismatches 19; Indels 0; 



Gaps 



0; 



Qy 341 tgggatttctgggcccaatacatatatcgcaggggatagatgaccgagtgatctcgccct 400 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I 
Db 92 TGGGATTTCTTGGCCCTATACATATATGGCAAGGAATGGATGATAAAGTGGTTCCTCCAT 151 

Qy 401 cagtggcagaatttgtgc 418 

I I I I I I I I I I I I I 
Db 152 CGATGACTGATTTTGTGC 169 



RESULT 10 

AL372617 

LOCUS 

DEFINITION 

ACCESSION 
VERSION. 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
COMMENT 



FEATURES 

source 



AL372617 518 bp mRNA EST 03-AUG-2000 

MtBA52C12Rl MtBA Medicago truncatula cDNA clone MtBA52C12 T7, mRNA 
sequence . 
AL372617 

AL372617.1 GI:9672370 
EST. 

barrel medic. 
Medicago truncatula 

Eukaryota; Viridiplantae ; Streptophyta; Embryophyta; Tracheophyta ; 
Spermatophyta; Magnoliophyta ; eudicotyledons ; core eudicots; 
Rosidae; eurosids I; Fabales; Fabaceae; Papilionoideae; Trifolieae; 
Medicago. 

1 {bases 1 to 518) 

Journet, E. P. , Crespeau,H., van-Tuinen, D . , Gouzy,J., Jaillon,0., 
Niebel,A., Carreau,V., Chatagnier , 0 . , Kahn,D., Gianinazzi-Pearson 
, V. and Gamas , P . 

Medicago truncatula ESTs from nitrogen-starved roots 
Unpublished (2000) 
Contact: Genoscope 

Genoscope - Centre National de Sequencage 
BP 191 91006 EVRY cedex - France 

Email: seqref@genoscope.cns.fr, Web : www.genoscope.cns.fr 
Contact : Pascal Gamas and Etienne- Pascal Journet, Laboratoire de 
Biologie Moleculaire des Relations Plantes-Microorganismes , 
CNRS-INRA, BP 27 31326 Castanet-Tolosan Cedex, France (Email : 
Mt-est@toulouse.inra.fr Website : 

http : //sequence . toulouse . inra . f r /Mtruncatula . html) . 
Location/Qualifiers 
1. .518 

/organism^ "Medicago truncatula" 
/cult ivar="Jema long" 



/db_xref="taxon:3880" 

/clone="MtBA52C12" 

/clone_lib="MtBA" 

/t issue_type="root tips " 

/dev_stage="harvested after 3 days of N-starvation" 
/note="Vector : pBluescript pSK; Site_l: EcoRI; Site_2: 
Xhol; Plants were grown in an aeroponic chamber for 14 
days on nitrogen-rich medium followed by 3 days on N-free 
medium. RNA was extracted from root tips (1-3 cm) . cDNA 
was prepared from polyA+ enriched RNA. The cDNA was 
directionally ligated into Uni-zapXR vector from 
Stratagene and packaged using Gigapack Gold packaging 
extracts. Plasmids containing cDNA inserts were 
mass-excised from phage stocks using ExAssit helper phage 
and propagated in SOLR cells. Clone ordering and 
sequencing was performed by the Centre National de 
Sequencage (Genoscope, Evry, France)." 

BASE COUNT 174 a 112 c 95 g 137 t 

ORIGIN 



Query Match 10.5%; Score 44; DB 10; Length 518; 

Best Local Similarity 73.7%; Pred. No. 0.015; 

Matches 56; Conservative 0; Mismatches 20; Indels 0; Gaps 0; 

Qy 34 3 ggatttctgggcccaatacatatatcgcaggggatagatgaccgagtgatctcgccctca 4 02 

I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 19 GGCTTTCTCGGTCCAATACACATATGGCAAGGAATGGACGACAAAGTGGTTCCCCCATCA 7 8 

Qy 403 gtggcagaatttgtgc 418 

I I I I I I I I I I I I 
Db 7 9 ATGACTGATTTTGTGC 94 



RESULT 11 

AZ406647 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
COMMENT 



AZ406647 414 bp DNA GSS 03-OCT-2000 

1M0175P21R Mouse lOkb plasmid UUGC1M library Mus musculus genomic 
clone UUGC1M0175P21 R, DNA sequence. 
AZ406647 

AZ406647.1 GI:10530660 
GSS. 

house mouse. 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 414) 

Dunn,D., Aoyagi,A., Barber,M., Beacorn,T., Duval, B., Hamil,C, 
Islam, H., Longacre,S., Mahmoud,M., Meenen,E., Pedersen,T. f Reilly 
,M., Rose,M., Rose,R., Stokes, R. , Tingey,A., von Niederhausern, A. 
and Wright , D. , Weiss, R. 

Mouse whole genome scaffolding with paired end reads from lOkb 

plasmid inserts 

Unpublished (2000) 

Contact: Robert B. Weiss 

University of Utah Genome Center 

University of Utah 



FEATURES 

source 



BASE COUNT 
ORIGIN 



Rm. 308, Biomedical Polymers Research Bldg., 20 S. 2030 E . , SLC, UT 

84112, USA 

Tel: 801 585 5606 

Fax: 801 585 7177 

Email : ddunn@genetics . utah . edu 

Insert Length: 10000 Std Error: 0.00 

Plate : 017 5 row: P column: 21 

Seq primer: CACACAGGAAACAGCTATGACC 

Class: plasmid ends 

High quality sequence stop: 414. 

Location/Qualifiers 

1. .414 

/organism="Mus mus cuius " 
. /strain="C57BL/6J n 
/db_xref="taxon: 10090" 
/clone="UUGClM0175P21" 

/clone_lib="Mouse lOkb plasmid UUGC1M library" 
/sex="Male" 

/lab_host="E. Coli strain XLIO-Gold, Tl-resistant, F-" 
/note="Vector : PWD42nv; Purified genomic DNA from M. 
musculus C57BL/6J (male) was obtained from the Jackson 
Laboratory Mouse DNA Resource 

(http://www.jax.org/resources/documents/dnares/) . The DNA 
was hydrodynamically sheared by repeated passage through a 
0.005 inch orifice at constant velocity. The sheared DNA 
was blunt end-repaired with T4 DNA polymerase and T4 
polynucleotide kinase. Adaptor oligonucleotides were 
ligated to the blunt ends in high molar excess. The 
adaptored DNA was purified and size-selected for a 9.5 to 
10.5 kb range using preparative agarose gel 
electrophoresis. Vector DNA was prepared from a derivative 
of pWD42 (gi|4732114 | gb | AF12 9072 . 1 ) , a copy-number 
inducible derivative of plasmid Rl . The vector was ligated 
with adaptors complementary to the insert adaptors and 
purified. The sheared, adaptored mouse DNA was annealed to 
adaptored vector DNA, and transformed into 
chemically-competent E. coli XLIO-Gold (Stratagene) cells 
and selected for ampicillin resistance." 
123 a 99 c 59 g 133 t 



Query Match 9.4%; Score 39.4; DB 13; 

Best Local Similarity 62.9%; Pred. No. 0.33; 
Matches 61; Conservative 0; Mismatches 36; 



Length 414; 
Indels 0; 



Gaps 



0; 



Qy 30 agcttaacaattcttagtagtcaccccttcgattaaatgtcaacatttgccttttcgcgt 89 

I I I I III I II I I I II II I III III II I I I I I I I I 
Db 7 9 ACCTTTGCCATGGTGGATTGATACTCCTGAAATCATATGCCAAAATAAATCTTTTCTCTT 131 



Qy 90 tccaattactaatgttacggcattattcaggacagaa 126 

I I I I I ! I I I I I II I I I I I I I I I I I I I I 
Db 139 TTAAGTTACTTATGCCAAGGAATTTTTCAGGACAGCA 175 



RESULT 12 
AQ083972 



LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
MEDLINE 
COMMENT 



FEATURES 

source 



BASE COUNT 
ORIGIN 



AQ083972 370 bp DNA GSS 26-AUG-1998 

HS_2226_B2_E10_MF CIT Approved Human Genomic Sperm Library D Homo 
sapiens genomic clone Plate=2226 Col=20 Row=J, DNA sequence. 
AQ083972 

AQ083972.1 GI:3452889 

GSS. 

human . 

Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 370) 

Mahairas,G.G. , Wallace, J. C . , Smith, K., Swart zell , S . , Holzman,T., 
Keller, A., Shaker, R. , Furlong, J., Young, J., Zhao,S., Adams, M.D. and 
Hood, L. 

Sequence-tagged connectors: A sequence approach to mapping and 
scanning the human genome 

Proc. Natl. Acad. Sci. U. S. A. 96 (17), 9739-9744 (1999) 
99380589 

Contact: Mahairas GG, Wallace JC, Hood L 
High Throughput Sequencing Center 
University of Washington 

401 Queen Anne Avenue North, Seattle, WA 98109, USA 

Tel: (206) 616-3618 

Fax: (206) 616-3887 

Email : jwallace@u . Washington . edu 

Sequence Tagged Connector 

Plate: 2226 row: J column: 20 

Class: BAC ends 

High quality sequence stop: 370. 
Location /Qua lif iers 
1. .370 

/organism="Homo sapiens" 
/db_xref="taxon: 9606" 
/clone="Plate=2226 Col-20 Row=J" 

/clone_lib="CIT Approved Human Genomic Sperm Library D" 
/sex="male" 

/ no te="Organ: sperm; Vector: pBeloBACll; BAC Clones in 
E-Coli DH10B" 
110 a 100 c 65 g 95 t 



Query Match 9.3%; Score 39; DB 13; Length 370; 

Best Local Similarity 65.5%; Pred. No. 0.42; 

Matches 57; Conservative 0; Mismatches 30; Indels 0; Gaps 0; 

Qy 36 acaattcttagtagtcaccccttcgattaaatgtcaacatttgccttttcgcgttccaat 95 

! II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
Db 250 AGAACTCCCAATTGTCAGAGCTACAAATCCCTGGCAACATAGCCCTGTTCGCATTCACAT 309 

Qy 96 tactaatgttacggcattattcaggac 122 

I I I I I I I I I I II III II II 1 
Db 310 TCCTAATGTTCCTGCCTTAATCCGGCC 336 



RESULT 13 
AZ873903 



LOCUS AZ873903 584 bp DNA GSS 21-FEB-2001 

DEFINITION 2M0187P15R Mouse lOkb plasmid UUGC1M library Mus musculus genomic 

clone UUGC2M0187P15 R, DNA sequence. 
ACCESSION AZ873903 

VERSION AZ873903.1 GI:13082429 

KEYWORDS GSS. 
SOURCE house mouse. 

ORGANISM Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
REFERENCE 1 (bases 1 to 584) 

AUTHORS Dunn,D., Aoyagi,A., Barber,M., Beacorn,T., Duval, B., Hamil,C, 

Islam, H., Longacre,S., Mahmoud,M., Meenen,E., Pedersen,T., Reilly 
,M., Rose,M., Rose,R., Stokes, R., Tingey,A., von Niederhausern, A. 
and Wright , D. , Weiss , R. 
TITLE Mouse whole genome scaffolding with paired end reads from lOkb 

plasmid inserts 
JOURNAL Unpublished (2000) 
COMMENT Contact: Robert B. Weiss 

University of Utah Genome Center 
University of Utah 

Rm. 308, Biomedical Polymers Research Bldg., 20 S. 2030 E., SLC, UT 
84112, USA 
Tel: 801 585 5606 
Fax: 801 585 7177 
Email: ddunn@genetics.utah.edu 
Insert Length: 10000 Std Error: 0.00 
Plate: 0187 row: P column: 15 
Seq primer: CACACAGGAAACAGCTATGACC 
Class: plasmid ends 
High quality sequence stop: 584. 
FEATURES Location/Qualifiers 
source 1. .584 

/organism="Mus musculus" 

/strain-"C57BL/6J" 

/db_xref="taxon: 10090" 

/clone="UUGC2M0187P15" 

/clone_lib="Mouse lOkb plasmid UUGC1M library" 
/sex="Male" 

/lab_host="E. Coli strain XLIO-Gold, Tl-resistant , 
/note="Vector : PWD42nv; Purified genomic DNA from M. 
musculus C57BL/6J (male) was obtained from the Jackson 
Laboratory Mouse DNA Resource 

(http://www.jax.org/resources/documents/dnares/) . The DNA 
was hydrodynamically sheared by repeated passage through a 
0.005 inch orifice at constant velocity. The sheared DNA 
was blunt end-repaired with T4 DNA polymerase and T4 
polynucleotide kinase. Adaptor oligonucleotides were 
ligated to the blunt ends in high molar excess. The 
adaptored DNA was purified and size-selected for a 9.5 to 
10.5 kb range using preparative agarose gel 
electrophoresis. Vector DNA was prepared from a derivative 
of pWD42 (gi|4732114 | gb | AF12 9072 . 1 ) , a copy-number 
inducible derivative of plasmid Rl . The vector was ligated 
with adaptors complementary to the insert adaptors and 
purified. The sheared, adaptored mouse DNA was annealed to 
adaptored vector DNA, and transformed into 



BASE COUNT 
ORIGIN 



chemically-competent E. coli XLIO-Gold (Stratagene) cells 
and selected for ampicillin resistance." 
144 a 92 c 182 g 166 t 



Query Match 8.6%; Score 36.4; DB 13; Length 584; 

Best Local Similarity 54.5%; Pred. No. 2.8; 

Matches 73; Conservative 0; Mismatches 61; Indels 0; Gaps 0; 

Qy 94 attactaatgttacggcattattcaggacagaactttactggaacgtcctgtgttcaatg 153 

II I I I I I I I I III I I I I II I I I I I II II 

Db 42 9 AGTGATCATGGATGTGCATATGTCTGTGCAGCAAGGAACATGGATGTGCATATGTCTGTG 4 88 

Qy 154 cattctgggaaaggaatgttgcagagtctgtgcagccaaggagatgcaaggccatttgtg 213 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 4 89 CAGCAAGGAACATGGATGTGCATGTGTCTGTGCAGCAGGGAACATGGATGTGCATGTGTC 54 8 

Qy 214 gacgaagctgtgct 227 

I I III III I 
Db 54 9 TATGTAGCAGTGAT 562 



RESULT 14 
AW280435/C 
LOCUS 

DEFINITION 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
COMMENT • 



AW280435 596 bp mRNA EST 04-JAN-2000 

fj40dl2.yl zebrafish adult brain Danio rerio cDNA 5 1 similar to 
SW:AP19_HUMAN P56377 CLATHRIN COAT ASSEMBLY PROTEIN AP19 ;, mRNA 
sequence . 
AW280435. 

AW280435. 1 GI: 6668984 
EST. 

zebrafish . 
Danio rerio 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Actinopterygii; Neopterygii; Teleostei; Euteleostei; Ostariophysi ; 
Cyprinif ormes ; Cyprinidae; Rasborinae; Danio. 
1 (bases 1 to 596) 

Clark, M., Johnson, S . L . , Lehrach,H., Lee,R., Li, F . , Marra,M., Eddy 
,S., Hillier,L., Kucaba,T., Martin, J., Beck,C, Wylie,T., Underwood 
,K., Steptoe,M., Theising,B., Allen, M., Bowers, Y., Person, B., 
Swaller,T., Gibbons, M., Pape,D., Harvey, N., Schurk,R., Ritter,E., 
Kohn,S., Shin,T., Jackson, Y., Cardenas, M . , McCann,R., Waterston,R. 
and Wilson, R. 

WashU Zebrafish EST Project 1998 

Unpublished (1998) 

Other_ESTs: f j40dl2 .xl 

Contact: Stephen L. Johnson 

Washington University School of Medicine 

4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108, USA 

Tel: 314 286 1800 

Fax: 314 286 1810 

Email : zbraf ish@watson . wustl . edu 

cDNA Library Preparation: John Ngai . cDNA Library Arrayed by: 
Matthew Clark. DNA Sequencing by: Washington University Genome 
Sequencing Center Clone distribution: Genome Systems, St. Louis, 
Missouri (web address: www.genomesystems.com) (email contact: 



info@genomesystems.com) and Research Genetics, Huntsville, Alabama 
(web address: www.resgen.com) (email contact: info@resgen.com) and 
RessourcenZentrumPrimarDatenbank, Berlin, Germany (web address: 
www . rzpd.de) 

Seq primer: T3 ET from Amersham 
High quality sequence stop: 488. 
FEATURES Location/Qualifiers 
source 1. .596 

/organism= M Danio rerio" 

/db_xref="taxon:7955" 

/clone_lib~"zebraf ish adult brain" 

/sex="mixed male and female" 

/tissue_type="brain" 

/dev_stage-"adult " 

/lab_host="E. coli DH10B" 

/note-"Vector : pZIPLOX; Site_l: Not I; Site_2 : Sail; 
Original library was constructed in lambdaZIPLOX . Mass 
excision of the cDNA library was performed to yield 
pZIPLOX plasmids. Insert check was done in original 
library . " 

BASE COUNT 154 a 163 c 164 g 115 t 

ORIGIN 



Query Match 8.6%; Score 36; DB 10; Length 596; 

Best Local Similarity 60.0%; Pred. No. 3.7; 

Matches 60; Conservative 0; Mismatches 40; Indels 0; Gaps 0; 

Qy 177 gagtctgtgcagccaaggagatgcaaggccatttgtggacgaagctgtgctgcaagtatc 236 

I I I I I I ! I I I III II Mill II I I I I I I I I I 

Db 556 GAGTGTGTGCAAGCGAGTGTGTGTGAGGTCTTCATGTGAGTCCGATCTCCTCCAGGACGC 4 97 

Qy 237 tgactggggtttcagcctatctgacatccaactgcagaag 276 

II I I I I II I I I II I I II I III I I I I I I II 

Db 4 96 TGCGTGGGGTTTCTGCTTCTTTGGCGTCCTCCTGCAGTAG 457 



RESULT 15 
AZ637668/C 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 



GI:11759858 



AZ637668 498 bp DNA GSS 13-DEC-2000 

1M0497K12F Mouse lOkb plasmid UUGC1M library Mus musculus genomic 
clone UUGC1M04 97K12 F, DNA sequence. 
AZ637668 
AZ637668. 
GSS. 

house mouse . 
Mus musculus 
Eukaryota; Metazoa; 
Mammalia; Eutheria; 
1 (bases 1 to 498) 

Dunn,D., Aoyagi,A., Barber, M., Beacorn,T., Duval, B., Hamil,C, 
Islam, H., Longacre,S., Mahmoud,M., Meenen,E., Pedersen,T., Reilly 
,M., Rose,M., Rose,R., Stokes , R . , Tingey,A., von Niederhausern, A . 
and Wright, D. , Weiss, R. 

Mouse whole genome scaffolding with paired end reads from lOkb 
plasmid inserts 
Unpublished (2000) 



Chordata; 
Rodentia; 



Craniata; Vertebrata; Euteleostomi ; 
Sciurognathi; Muridae; Murinae; Mus. 



COMMENT 



FEATURES 

source 



BASE COUNT 
ORIGIN 



Contact: Robert B. Weiss 
University of Utah Genome Center 
University of Utah 

Rm. 308, Biomedical Polymers Research Bldg.,,20 S. 2030 E., SLC, UT 

84112, USA 

Tel: 801 585 5606 

Fax: 801 585 7177 

Email : ddunn@genetics .Utah . edu 

Insert Length: 10000 Std Error: 0.00 

Plate: 0497 row: K column: 12 

Seq primer: CGTTGTAAAACGACGGCCAGT 

Class: plasmid ends 

High quality sequence stop: 498. 

Location/Qualifiers 

1. .498 

/organism="Mus musculus" 
/strain="C57BL/6J" 
/db_xref="taxon: 10090" 
/clone="UUGClM04 97K12" 

/clone_lib="Mouse lOkb plasmid UUGC1M library" 
/sex="Male" 

/lab_host="E. Coli strain XLIO-Gold, Tl-resistant , F-" 
/note="Vector : PWD42nv; Purified genomic DNA from M. 
musculus C57BL/6J (male) was obtained from the Jackson 
Laboratory Mouse DNA Resource 

(http://www.jax.org/resources/documents/dnares/) . The DNA 
was hydrodynamically sheared by repeated passage through a 
0.005 inch orifice at constant velocity. The sheared DNA 
was blunt end-repaired with T4 DNA polymerase and T4 
polynucleotide kinase. Adaptor oligonucleotides were 
ligated to the blunt ends in high molar excess. The 
adaptored DNA was purified and size-selected for a 9.5 to 
10.5 kb range using preparative agarose gel 
electrophoresis. Vector DNA was prepared from a derivative 
of pWD42 (gi|4732114 | gb | AF12 9072 . 1 ) , a copy-number 
inducible derivative of plasmid Rl . The vector was ligated 
with adaptors complementary to the insert adaptors and 
purified. The sheared, adaptored mouse DNA was annealed to 
adaptored vector DNA, and transformed into 
chemically-competent E. coli XLIO-Gold (Stratagene) cells 
and selected for ampicillin resistance." 
169 a 107 c 81 g 141 t 



Query Match 8.4%; Score 35.4; DB 13; Length 498; 

Best Local Similarity 50.9%; Pred. No. 5.4; 

Matches 84; Conservative 0; Mismatches 81; Indels 0; Gaps 0; 

Qy 23 tgaggttagcttaacaattcttagtagtcaccccttcgattaaatgtcaacatttgcctt 82 

III I I I I I I I I I I I I I I I I I I I I I II III III 

Db 223 TGAACCTAGTTTTCTCGTGTTCTGTAGGAACGCCTTCTAATAGTTGATCTGAGTTTTCTT 164 

Qy 83 ttcgcgttccaattactaatgttacggcattattcaggacagaactttactggaacgtcc 142 

II III II I I I I I I I I I I I I I I I I III II 
Db 163 TTATTGTGCTGTTTTACACTGTGATAATAAAATGCAGCAATTCACATCAGCTGAAGATCA 104 



i 



Qy 143 tgtgttcaatgcattctgggaaaggaatgttgcagagtctgtgca 187 

II I I II I I II I I II I I I I I I I I I I 

Db 103 GAGTTTGATTAAATGGAAGGAAGGGAATCACAGAGAGTCAGAGAA 59 



Search completed: February 7, 2002, 08:20:41 
Job time: 18118 sec 



